1. 13

I’ve been advising family and friends to pick long memorable phrases for their passwords since most won’t pay for a password manager. I’ve also said that if it’s long enough then they shouldn’t need to worry about putting in special characters, they can just type the password in lowercase if they like. I’m talking about 30+ character passwords like ohwedoliketobebesidetheseaside

It made me wonder, is a 30 character password made from words more or less secure than a 15 character mixed characters and symbols password?

    1. 6

      This question is easy to answer. Given a finite alphabet A with a entries and a word of length n made of letters of the Alphabet A, there are

      a ^ n
      

      possible combinations. If we now think of “expanding” in both directions by factor f > 1 (either make the alphabet bigger or make the word longer), we get

      (f * a) ^ n = exp(ln((f * a) ^ n)) = exp(n * ln(f * a)) = exp(n * (ln(f) + ln(a))) =: p(a, n, f)
      

      for expanding the alphabet and

      a ^ (f * n) = exp(ln(a ^ (f * n))) = exp((f * n) * ln(a)) = exp(n * (f * ln(a))) =: q(a, n, f)
      

      for expanding the word length.

      We can see that it holds for fixed n and a that

      q(f) > p(f)
      

      and thus it is better to increase the length of your password than growing the alphabet by the same factor. Don’t try to keep in mind crazy combinations of letters. Stay lowercase if you like, think of sentences

      "the crazy horse drinks gatorade"
      

      and you will become the master of password management. If someone guesses your password, he cannot even assume a certain subset of the alphabet and has to go all the way. What I recommend as an additional security measure is to use uncommon punctuation at the end (like terminating with “._.”). Still, easy to remember if you do it for all passphrases and you’re good to go.

    2. 3

      Start with a desired amount of entropy. Say 64 bits. Encode that into a password anyway you like. 8 words selected from a set of 256. Base64 encoded. Whatever. It’s all the same.

      Do not attempt to measure the entropy in a password. You can’t.

      1. 1

        If you only draw from 256 words, don’t publish them on the Internet.

        1. 5

          256 choose 8 with replacement. Do the math. :)

          1. [Comment removed by author]

        2. 2

          Why not? tedu’s password would still be as strong as 9 random ASCII characters.

          1. 1

            Stronger, most likely, since I challenge you to type e.g. NUL or ESC (or any of the 30-ish more obscure control characters) into a normal password field.

            1. 2

              But neither the plaintext nor the base 64 encoded passcodes have NUL or ESC characters in them. Base64 specifically will only include alphanunerics with special symbols for padding. The plaintext is chosen from a set of words.

    3. 3

      the draft nist guidelines look pretty relevant to this question: https://pages.nist.gov/800-63-3/.

      https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-what-you-need-to-know/ is easier to read though.

    4. 1

      For extra flavor, throw in spaces.

    5. 1

      Maybe this password strength checker might help?

      1. 2

        I have my doubts – some serious concerns, actually – about the value of that password checker.

        I put in five words, randomly chosen from a dictionary, with spaces between them. It generated a very mediocre score: 56%. I was under the (possibly mistaken?) impression that randomly selecting five words from a dictionary would be an excellent password. (Corrections welcome.)

        In addition, I wouldn’t want anyone to be encouraged to enter any of their real passwords on a web site like this, as it could very well use that information maliciously. I’m not saying this particular web site does this, but I don’t think putting passwords into “password checker” web sites is something we generally want to encourage people to do.

        1. 4

          How big was your dictionary? It’s actually fairly easy to compute how many passwords a given generation scheme can produce.

          For example, my /usr/share/dict/words has 99,171 words in it. Picking five at random (without replacement) with cat /usr/share/dict/words|sort -R|head -n 5|tr $'\n' ' ' allows for (99,171 choose 5) different passwords, which is 79,927,903,812,879,014,029,704, or about 76 bits or 13 case-sensitive alphanumeric characters. (Choosing with replacement makes for a significantly easier to calculate but only marginally bigger 83 bits/14 characters.)

          I generated a couple of 13-character alphanumeric passwords and got an average score between 80 and 90, and a couple of five-word passphrases, which mostly got 100, so that seems in line to me. However, it heavily penalizes passphrases that consist entirely of lowercase letters and space, and my dictionary has lots of proper nouns and possessives. Filtering those, the passphrase scores were much worse—50-60-ish. (Interestingly, a 5-word passphrase generated from this shorter dictionary—66,005 words—is still worth a 12-character password. This is why experts advise you to concentrate on length over alphabet/dictionary size.)

          So it’s safe to say this checker isn’t consistent with the actual amount of entropy in a given password. But it looks like it’s trying to penalize the sorts of habits that result in bad passwords, even if that results in a very skewed “good” password space. It’s much more important to it that “password1” get a bad score than that “signals constriction punchy rejoinders titanic” get a good one. I’m not sure it’s biased in the best way (“signals constriction etc.” is far more likely to be remembered and used than the otherwise-equivalent “8inHpcw47jUdD”), and I don’t think I’d recommend it for that reason, but the premise is probably sound.

        2. 4

          Well, let’s do the math. According to a quick search there are about 171,476 words in current usage–that’s about 2^17.387647.

          So, assuming that you pick each word at random and allow duplicates, getting your password is:

          P(5word_pass) = 1/171476 * 1/171476 * 1/171476 * 1/171476 * 1/171476
          P(5word_pass) = ( 2^-17.387647 ) ^ 5
          P(5word_pass) = ( 2^-86.938235 )
          

          So, we’ll setup the same trick using uppercase letters (26), lowecase letters (26), digits (10), and other characters (33). So, at random, we can choose a character from those sets, and that’s a 1/95 chance of any particular character being picked.

          Let’s see how many characters we need to match the 5-word password!

          P(5word_pass) = ( 2^-86.938235 )
          ( 2^-86.938235 ) = (1/95)^N
          ( 2^-86.938235 ) = (2^- 6.569856)^N
          ( 2^-86.938235 ) = 2^(- 6.569856N)
          -86.938235 = -6.569856N
          N = 13.232898
          

          And to double check:

          (1/95) ^ 13.232989 = 6.7422567e-27
          2^(-86.948235) = 6.7422567e-27
          

          So, it looks like you’d have to use about 14 characters from that class defined above to get the same strength as 5 dictionary words.

          1. 1

            Intriguing that 5 random words be roughly equivalent to a 14 character password. The space of 5 word memorable phrases that most people will choose is going to be quite small in practice so for my family I need to reinforce the idea that they should choose random words, e.g. by flipping through a book.

            Of course, the search space become much bigger if they also include proper nouns.

            1. 2

              they should choose random words, e.g. by flipping through a book.

              People are terrible at “choosing” randomly. They’re going to pick words they like and discard words they don’t.

              1. 2

                That’s why it’s called “diceware” ;) The EFF recently created a list of words to use for creating passwords like this, and then you use dice to pick a word for you.

                we recommend a generating a six-word passphrase with this list, for a strength of 77 bits of entropy.

                https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases

    6. 1

      If they’re not going to use a password manager then using lots of words is probably the best you’re going to do. You can mix it up a little by throwing some numbers or symbols in there and varying the capitalization.

      It also helps if they use some non-dictionary words in there. But nothing that’s easy to remember for humans is going to be as good as a bunch of random characters including non-alphanumeric.

      1. 1

        A potential attacker would have to assume that a 30 character password could contain any characters. If that’s true, having 30 lowercase letters is as secure as 30 mixed case or 30 with numbers and symbols?

        1. 2

          I think the concern is dictionary based brute force attacks. People love talking about entropy with regards to password strength, but I think that ignores search space. A long passphrase has very high entropy on a per-character basis, but a memorable series of lowercase words chosen from common vocabulary (not a dictionary) is more likely to be attempted first before a similar length string with special characters. Another thing that I think complicates the discussion is whether you’re protecting against login attempts, hash cracking (from a credential db leak), or a targeted attack that may have already obtained a passphrase you use elsewhere. Different patterns and recommendations tend to focus on addressing one or more of these concerns, but any strategy requiring memorization (no password manager) will compromise on these. All that is to say a long passphrase is good, but a long passphrase with some special characters is even better at guarding against one type of attack.

        2. 1

          You should consider attack vectors. If you have a list of all known usernames like say a forum has a directory of users, then my first attack would be to try every known username against common passwords like password, password1, hunter2, Password, dog, etc.

          Another attack vector is you know the user you want to compromise, so you try potential passwords like names of family, friends, significant things in their life. You can open it up to just all English words, then vary that by special characters trying some leet speak substitutions.

          In summary, I’m just saying that an attacker knows the total solution space is gigantic, but could also assume some basic human nature that were lazy so we make passwords easy and memorable. An attacker might be able to exploit this and reduce their search space enough to have a feasible attack.

        3. 1

          Why would they have to assume that? Especially knowing that XKCD has made series of words passwords popular. Plus, you don’t know how long a password is based on its hash.

          Seems like a totally reasonable attack vector nowadays to me to just randomly assemble the most common couple thousand words.

    7. 1

      I have short passwords, usually, but a different password for every site. I’m not worried about somebody cracking a specific site, just that the damage be contained.

    8. 1

      I just use maximally long, no older than six months random passwords everywhere, save for a memorizable one on my 1Password vault, which I change every month or so. It’s working OK for me and the threat model I labor under.

    9. 1

      Longer is generally better. Ordinary English has about 3 bits of entropy per character, so 90 bits for your 30 character password. If you were using the full ascii table for your 15 character password and generating completely at random you’d have 7 bits/character for a total of 115, but realistically you can’t type most of the control codes. Random base64 would give you 90 bits. Realistic approaches like “select a couple of words and replace a couple of letters with similar-looking numbers or symbols” would be much less, say 50 bits (45 bits for the words, say 5 bits for the numbers/symbols, since for a typical word there are only a few characters that you would or wouldn’t switch to numbers or symbols).