1. 10
    How Websites Check Your Password security web kevq.uk
  1.  

  2. 17

    how can they check your password without being able to read your password?

    Followed by explaining how the web site does, in fact, read your password. It just doesn’t retain it.

    MD5

    I know this is meant as a descriptive example, but this kind of post is inevitably interpreted as normative advice. MD5 is bad advice in pretty much every context I can think of. For password hashing specifically, you most probably want memory hard functions (more recently, cache hard functions are popping up). Please replace every instance of MD5 by “Scrypt” or “Argon2” (my own favourite is Argon2i). You may also want to mention the term “password hash” at some point, so people know the generic term, and get the idea that passwords are not hashed the same way other things are.

    When you subsequently try to login to your account, the string you enter into the password field is hashed in the same way as when you signed up, and the results are compared to the hashed password they have within their database.

    This is overlooking a crucially important detail: which machine exactly is performing the hash?

    • If the client performs the hash, then sends it, the web site doesn’t know what word you typed to get that hash, but it doesn’t really matter: they get a hash, then compare it to the database. But if their database is stolen, an attacker could just “pass the hash” over the network, without even bothering to guess the password. It will work just as well. So the hash is effectively the password. Oops.

    • If the server performs the hash, well, you have to give them the plaintext password. You have to trust the website with your plaintext password, and just hope they don’t do the wrong thing and store it plaintext.

    That’s basically how websites check your password without actually knowing your password.

    As I’ve said, the website does know your password, though only for the time required to receive it and hash it.

    Things like hash collisions, and therefore more secure algorithms, aren’t covered. Neither is salting.

    Hash collisions don’t really matter here. Password hash are vulnerable to preimage attacks, which are most efficiently performed with a dictionary search (and precomputed tables if there’s no salt). You can avoid mentioning “more secure algorithms” by not naming broken algorithms to begin with. Hence my (stern) suggestion to replace “MD5” by Argon2”.

    You really should address salts right there in this blog post. It’s not complicated, and it would only add 30 seconds to the read time. 1 minute if you explain what they are for (they prevent batch attacks where you search multiple entries with a single guess, and they prevent pre-computation attacks where you compute a rainbow table in advance to accelerate password cracking once you steal the database).

    The really advanced (and more secure) stuff you can leave out is augmented PAKE (Password Authenticated Key Exchange). They’re fairly complicated, but come with a number of advantages compared to the classical scheme:

    • The server really doesn’t learn your password.
    • The expensive (memory hard) hash can be done by the client (we call that “server relief”). This can keep server costs down, and mitigate some DoS attacks. (Note: server side relief can be done without PAKE.)
    • You don’t need to encrypt the exchange (though you do need to authenticate the server).

    The number one inconvenient however is that you must do fairly complex cryptographic stuff on the client. Crypto on JavaScript on the browser is scary beyond belief, so unless you really really know what you are doing, best stick to the old methods. But if you have a standalone client, that’s pretty compelling. If you want more key words, see CPace, AuCPace, OPAQUE. And maybe OPRF (Oblivious Pseudo Random Function, that’s a building block for PAKE).

    1. 1

      Followed by explaining how the web site does, in fact, read your password. It just doesn’t retain it.

      You could make a minor distinction here between the web application that receives the password from the form and the database server. In that sense, the database server indeed never learns the password, so it’s not technically wrong depending on your specific definition of website.

      1. 1

        You could make a minor distinction here between the web application that receives the password from the form and the database server.

        That distinction is usually meaningless. What happens in many applications, is that as soon as the password reaches the company’s receiving server, it bounces in their internal network unencrypted. Anyone or anything that could sniff the network internally might access what’s there, including the password. And that’s often before your password reaches the actual program that will hash it.

        And in smaller web applications, the database and back end live on the same machine anyway. There is no “database server”, just a single “back end”.

    2. 5

      Lovely short article, but I’d like to make a few notes and nits:

      First of all, this is how websites hopefully check your password, in principle. You can still find plenty of websites that store plaintext passwords. Which is a terrible situation, but one we’re stuck with.

      Please, I beg of you, please stop mentioning MD5 for anything but an introduction to broken hash functions and why a kitten dies a cruel and unusual death every time you type the letters md5 into your code. MD5 has been broken for upwards of a decade. We shouldn’te even be talking about it anymore—much less in an introductionary piece. If you must namedrop a hash function, make it something that isn’t a corpse of a hash function, such as the SHA-2 family (SHA-256, SHA-384, SHA-512). The less people have heard of MD5, the less people will be tempted to use MD5 because of “brand recognition” so to speak.

      I know you’re declaring it out of scope, but you really should at least mention in passing somewhere in the main article that (correct) password storage uses (or at least should use) hash functions that are computationally expensive, such as Argon2 or scrypt, so that guessing passwords is slow.


      For your own interest (though you may already be aware of this): Additionally, I kind of have a faint hope that we can kill password hashing on the server entirely. While WebAuthn sure kills passwords as a whole, I don’t see it taking off yet. Meanwhile, the IETF’s Crypto Forum Research Group (CFRG) has just recently concluded a PAKE selection competition. An augmented PAKE in particular allows pushing the expensive hash function over to the client so that the server doesn’t need to bear the load (server relief), which also heavily punishes password brute-forcing attempts via the website by making the attackers bear the computational load and DoS attacks on the authentication endpoints; however, this would require help from the browser, e.g. in the form of a WebAssembly module that performs the necessary computations.

      1. 0

        I really don’t want to kill any more kittens (you killed a few yourself there!) The reason I went with ‘the hash that shall not be named’ is because it’s the most recognisable one that people may have heard of.

        Also, the longer hashes produced by SHA would have made my illustration more difficult to produce. :-)

        1. 3

          Also, the longer hashes produced by SHA would have made my illustration more difficult to produce. :-)

          Let’s compare for a minute:

          MD5    : df5025b0624e5d0510b83e2f715989ad
          SHA-256: 0a8584c646e5cf14e119f538b04e1a92fd74cdbb0b431756b8e44fac8f0fb4b3
          

          Inconvenient perhaps, but not that difficult. Plus, you could always just change the function name without changing the hash. Nothing prevents someone from truncating hashes down to 128-bit either. As long as the password themselves have less than 128 bits of entropy (which is almost all the time), the 16-byte hash won’t even reduce security.

          Just replace “MD5” by “Scrypt” or “Argon2”. You don’t even have to change the hashes. People who are nitpicky enough to actually hash the password and notice your hash is actually MD5 are likely competent enough to not use MD5 in production. But if you do write “MD5”, some Dunning Kruger fueled idiot will think they’ve got themselves a tutorial and run with it.

          Computer security is unfair like that: it doesn’t matter what you wrote. What matters is what people end up misreading. Ideally, we should be able to stop reading at any point, and still be safe. If you put a dangerous example first, and the warnings last, people are going to miss the warnings. If you avoid the dangerous example, people might not recognise that “Argon2” oddity, but at least they won’t need the warning.

      2. 2

        When you subsequently try to login to your account, the string you enter into the password field is hashed in the same way as when you signed up, and the results are compared to the hashed password they have within their database.

        Mentioned in the detailed sibling comment, but this really needs to include some detail on how the network between password field and database comes in.

        1. 1

          Do you mean like, is the network transmission secure? Or subject to a man in the middle attack?

          Or did you mean something else?

          1. 2

            There’s no mention of where the password gets hashed, and what gets sent over the network. That’s a very confusing omission for the target audience of this article.

        2. 2

          Meta note: The off-topic and spam flags are not substitutes for incorrect or inaccurate. Submissions do not have an incorrect flag, so the only correct options to take are not upvoting and perhaps hiding the thread.