1. 65
  1.  

  2. 29

    I’m posting this in a top-level comment so nobody thinks I’m singling/calling them out when I say this:

    I’ve heard the criticism of the blog post title and will take it to heart for future posts. We don’t need to keep talking about it in depth, and I worry that the discussions are going to get excessively heated if they continue their current trajectory.

    As many of you have pointed out, I’m also a programmer. So I’m punching myself with that remark too. And that part’s deliberate; many of my jokes have a tinge of self-deprecation to them. (That’s not because of, like, depression or low self-esteem or anything. I just have enough humility to make fun of myself without drawing blood.)

    My intention was to grab people’s attention and comment on a common cognitive bias I hear from programmers–many whom are renowned experts in the field–when someone brings up hash functions. It was not meant to make anyone feel bad. Quite the opposite: These misunderstandings are pervasive, and everyone makes dumb mistakes at least once.

    Here’s a real world example of what I mean.

    PASETO v1/v2 used a naked hash instead of HMAC for hedged nonces, which means if your RNG failed (unlikely), the derived nonce would be a hash of the plaintext–which is congruent to this problem from my blog post. PASETO was intended to be a boring cryptographic design. This was an interesting failure mode that was made possible by misunderstanding the correct way to use hash functions to solve a specific problem. Does that mean PASETO is bad? No, but you should probably switch to v3/v4 for other reasons.

    Mistakes are how we learn, but patterns in mistakes implicate one of two things:

    1. A failure in education (correctable by publishing better education material), or
    2. A failure in tooling (correctable by writing better specifications and standards that are easy to use and hard to misuse, from which better tools can be implemented)
    1. 8

      Please don’t get discouraged by the (harsh) criticism to make more blog posts, I like the technical detail and content.

      1. 7

        The only thing that would effectively discourage me from making more blog posts is running out of interesting things to talk about, and needing to sate my boredom with low-quality garbage.

      2. 3

        For the record, I though the title was brilliant… because no, very few of us understand hash functions — and most who do are more mathematician than programmer — but also we should be willing to admit that.

        “Humans are ignorant, inherently rather stupid, and prone to knee jerk reactions.” — A. Human

      3. 21

        This is a relatively complete overview for cryptographers and low-level programmers.

        The comparison between cryptographic and “general-purpose” hash functions is missing motivations for why non-cryptographic hash functions are used. The obvious motivation is speed; both constant and asymptotic runtime of hash tables and other data structures are sensitive to choice of hash function.

        While I know it would have made the section a little more confusing, I wish there were a tangential mention of the Cryptographic Doom Principle in the “Encrypt and Hash” section. MAC is necessary but can still be misused.

        Finally, while it’s pedantic, I feel like we could do a better job of not omitting the biggest elephant in the room. Wikipedia gets this wrong too; from the opening of their article on cryptographic hash functions:

        A cryptographic hash function (CHF) … is a one-way function, that is, a function which is practically infeasible to invert or reverse the computation.

        However, one click away, on the page on one-way functions:

        The existence of such one-way functions is still an open conjecture. In fact, their existence would prove that the complexity classes P and NP are not equal, thus resolving the foremost unsolved question of theoretical computer science.

        We should use the subjunctive mood here, and speak hypothetically, because we have not yet proven the correctness of the portion of cryptographic research which relies on these hash functions. Speaking plainly, cryptographers should not assume that P is not equivalent to NP, even though evidence suggests P != NP. I know it’s silly, but it has serious ramifications.

        1. 7

          While I know it would have made the section a little more confusing, I wish there were a tangential mention of the Cryptographic Doom Principle in the “Encrypt and Hash” section. MAC is necessary but can still be misused.

          I thought this is covered by AEAD, but that’s in the subsequent section. I’ll make an explicit call-out.

          Finally, while it’s pedantic, I feel like we could do a better job of not omitting the biggest elephant in the room.

          Sounds like a separate blog post that deserves to be written and shared here, should you wish to do so. I’m not the right person to make that argument.

          1. 1

            So I took a quick look at https://people.eecs.berkeley.edu/~sanjamg/classes/cs276-fall14/scribe/lec02.pdf and, well, I can’t understand it. So I’m going to cheat and ask strangers on the internet for help.

            What is a good definition for a “one-way function”? I think definition 5 from the paper defines it (but I don’t understand the notation). Is f(x) = 1 an acceptable one-way function? I’ve always thought that crypto hashes are one way because, in practice, they always reduce a large number of bits down to a smaller set of possible hashed values, therefore they are “one-way”. For example, reducing 1 megabyte of data into 1024 bits.

        2. 11

          Thanks for the article, I really enjoyed it. If there are any budding cryptographers looking for a PhD topic, there’s currently a big gap in the state of current cryptosystem that’s hinted at in the HMAC section and that’s going to become increasingly important over the next decade or so: There are, to my knowledge, no HMAC schemes that can provide error correction in addition to error detection. The industry is gradually shifting to a model where an IC’s package boundary is the edge of a trust domain and anything that goes off-package must be encrypted. This means doing line-rate authenticated encryption at rates of hundreds of gigabits a second.

          For unencrypted traffic, a single bit flip is annoying. For encrypted traffic, a single bit flip completely destroys the entire message. This is bad enough when you consider the fact that the bit flip can occur inside the hardware crypto unit itself, this becomes even more problematic. An ideal algorithm would compute error-correction codes before encryption that remain stable throughout the encryption (effectively a highly specialised homomorphic encryption scheme) such that a small number of bit flips in the pipeline of encrypt, transmit, receive, decrypt could be detected and repaired. Currently, this is worked around by adding error correction both before and after encryption. This adds latency (you’re computing error correction information twice, which is inherently sequential), consumes power (you’re transmitting redundant error correction information), and is not especially robust (a bit flip in the encryption engine may leak some information about a key).

          1. 4

            There are, to my knowledge, no HMAC schemes that can provide error correction in addition to error detection.

            Tangent: There was a rather silly HN thread a few months ago where someone lauded GPG for failing open when an authentication failure occurs, and they cited it as an “error correction” boon. My counter to that was to suggest piping age output to a Reed-Solomon encoder for encrypted backups. (Feel free to replace RS with your favorite EC scheme.)

            Any new cryptographic primitive that aims to provide the same security properties as HMAC, but also error-correction, may be better served at the tooling (or at least protocol) level by doing something like what I outlined above.

            This may be a disappointing answer for prospective Ph.D students, but it’s still something that can be practically solved at an industry level today.

            This is bad enough when you consider the fact that the bit flip can occur inside the hardware crypto unit itself, this becomes even more problematic.

            Ah, fault attacks. That’s actually a harder problem to contend with.

            Currently, this is worked around by adding error correction both before and after encryption. This adds latency (you’re computing error correction information twice, which is inherently sequential), consumes power (you’re transmitting redundant error correction information), and is not especially robust (a bit flip in the encryption engine may leak some information about a key).

            Additionally, if you apply an ECC before encrypting, you may create a side-channel attack (similar to how compression became BREACH/CRIME against SSL/TLS).

            1. 2

              I thought a polar code might be able to do this - e.g. inject a secret into what are normally the error-correcting bits, truncate the message, leaving enough bits for both decrpytion and some level of error correction as well.

              Googling “polar code cryptography” seems to generate a few hits. I’m not sure how efficient or practical such schemes would be on chip.

            2. 20

              It’d be nice to have some actual background on hashing in here instead of just broad generalizations and links to various hash functions. Examples:

              • There’s no mention of cyclic redundancy checks and why they are not valid as crypto functions (a mistake some programmers have made).
              • There’s no mention of avalanche effects, which is a good way of seeing how “random” a digest scheme is (with some implications for how well the output can be predicted/controlled by an attacker).
              • The mentioned attack on JSON hash tables in PHP (if you dig into it) would’ve been a great place to talk about trivial hashes (e.g., f(x) =0 or f(x)=x) and why they cause problems even in non-hostile environments, but that would’ve required more of an introduction to how hashing works…)
              • Lots of usage of jargon like “non-invertible”, “collision-resistance”, “preimage attack resistance”, etc. which is probably inaccessible if your audience is programmers who “don’t understand hash functions”.
              • There’s not really an explanation about the differences/similarities of crypto-strong hash functions, password hash functions, and key derivation functions, other than a mention that there is some relation but which isn’t elaborated on at all.
              • There’s not really any useful information at all about perceptual hashing vs other forms of multimedia digest approaches–there’s just some Apple hate.
              • etc.

              Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

              (also, can you please cool it with the inflammatory article headlines?)

              1. 24

                Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

                Please don’t pick a fight. It seems more angry than friendly.

                1. 22

                  Honestly I think it’s a valid concern. One of the biggest problems with the computer security world, as stated repeatedly by leading experts in the field, is communication and teaching.

                  1. 23

                    A valid concern would be “infosec experts may not understand pedagogy” but why call out “infosec furries” specifically? Unless we should be concerned about infosec furries in particular vs other infosec experts?

                    Are these acceptable?

                    • but infosec gays may also not understand pedagogy
                    • but infosec women may also not understand pedagogy
                    • but infosec people of color may also not understand pedagogy

                    No. So why furries? People need to get over it and quit furry bashing. This isn’t acceptable behavior on Lobste.rs, and I’m tired of it.

                    1. 3

                      See elsewhere for the explanation; furry bashing doesn’t enter into it, though I see why you might have read it that way. Furries are internet denizens like the rest of us, with all that entails.

                      1. 12

                        I agree with you that it’s a bad title.

                        I also think that you wouldn’t have reacted nearly this strongly to the title if it wasn’t a furry blog.

                        1. 11

                          I read your other comments. But you said what you said, and that undermines all your pontificating about the harm of “insulting/demeaning a group” and “the sort of microaggression/toxicity that everybody talks so much about.” Take your own advice.

                        2. 2

                          “Furry” is a kink, not an identity or protected class. And normally you have to get people’s consent before you bring them into your kink.

                          1. 7

                            I don’t see any sexual imagery in this blog post.

                            1. 2

                              The OP’s site has some pretty well reasoned and presented articles on precisely why “furry” cannot reasonably be summarized as “a kink”.

                              And, no, you do not “normally” have to get someone’s consent to introduce them to the idea of your kink, unless said introduction involves you engaging them in the practice of your kink.

                            2. 1

                              Sorry, I didn’t realize the “furry” part was what you were opposed to. It sounded like you were upset with the implication that the infosec world is bad at teaching.

                        3. 6

                          Programmers might not understand hash functions, but infosec furries may also not understand pedagogy.

                          (also, can you please cool it with the inflammatory article headlines?)

                          https://www.youtube.com/watch?v=S2xHZPH5Sng

                          1. 10

                            One of the things he talks about there is testing the hypothesis and seeing which title actually worked. I only clicked this link because I recognized your domain name and knew you had written interesting articles in the past and might legitimately explain something I didn’t know. If not for that, I probably would have bypassed it since the title alone was not interesting at all.

                            1. 9

                              Even so, it is still possible to write clickbait titles that aren’t predicated on insulting/demeaning a group.

                              • “Hash functions: hard or just misunderstood?”
                              • “Things I wish more programmers knew about hashes”
                              • “Programmer hashes are not infosec hashes”
                              • “Are you hashing wrong? It’s more common than you might think”
                              • “uwu whats this notices ur hash function

                              How would you feel if I wrote “Gay furries don’t understand blog posting”? Even if I raise good points, and even if more people would click on it (out of outrage, presumably), it would still probably annoy a gay furry who wrote blogs and they’d go in with their hackles raised.

                              1. 8

                                The important difference between what I wrote and your hypothetical is the difference between punching up and punching down.

                                My original title was along the same lines as “Falsehoods Programmers Believe About _____” but I’ve grown a distaste for the cliche.

                                1. 7

                                  The difference between “Programmers don’t understand hash functions” and “Gay furries don’t understand blog posting” is quite obvious to me and I definitely don’t want to engage in whatever Internet flame is going on here. Especially since, uh, I have a preeetty good idea about what the problem here is, and I tend to think it’s about gay furries, not article titles, which is definitely not a problem that I have. (This should probably be obvious but since I’m posting in this particular thread, I wanted to make sure :P).

                                  But I also think this title really is needlessly nasty, independent of how it might be titled if it were about other audiences. It’s a bad generalisation – there are, in fact, plenty of programmers who understand hash functions – and it’s not exactly encouraging to those programmers who want to get into security, or who think their understanding of these matters is insufficient.

                                  I am (or was?) one of them – this was an interest of mine many, many years ago, at a time when I was way too young to understand the advanced math. My career took me elsewhere, and not always where I wanted to go, and I tried to keep an eye on these things in the hope that maybe one day it’ll take me there. Needless to say, there’s only so much you can learn about these topics by spending a couple of evenings once in a blue moon studying them, so I never really got to be any good at it. So I think the explanation is amazing, but it would definitely benefit from not reminding me of my inadequacy.

                                  And I’m in a happy boat, actually, this is only an interest of mine – but there are plenty of people who have to do it as part of their jobs, are not provided with adequate training of any kind, have no time to figure it out on their own, and regularly get yelled at when they get it wrong.

                                  Now, I realise the title is tongue-in-cheek to some degree, the playful furries and the clever humour scattered throughout the post sort of gives it away. If you think about it for a moment it’s pretty clear that this is meant to grab attention, not remind people how much they suck. But it’s worth remembering that, in an age where web syndication is taken for granted to the point where it sounds like a Middle English term, this context isn’t carried everywhere. Case in point, this lobste.rs page includes only the title. Some people might react to it by clicking because you grabbed their attention, but others might just say yeah, thanks for reminding me, I’ll go cry in a corner.

                                  Even if I didn’t realise it was tongue-in-cheek, it probably wouldn’t bother me, partly because I understand how writing “competitively” works (ironically, from around the same time), partly because I’ve developed a thick skin, and partly because, honestly, I’ve kindda given up on it, so I don’t care about it as much as I once did. But I can see why others would not feel the same way at all. You shouldn’t count on your audience having a thick skin or being old enough to have given up on most of their dreams anyway.

                                  I know this is a real struggle because that’s just how blogs and blogging work today. You have to compete for attention to some degree, and this is particularly important when a large part of the technical audience is “confined” to places like HN and lobste.rs, where you have to grab attention through the title because there’s nothing else to grab attention through. But maybe you can find a kinder way to grab it, I dunno, maybe a clever pun? That never hurt anyone. These radical, blunt (supposedly “bluntly honest” but that’s just wishful thinking) headlines are all the rage in “big” Internet media because, just like Internet trolls, they thrive on controversy, us vs. them and a feeling of smugness, but is that really the kind of thing you want to borrow?

                                  (Edit: just to make sure I get the other part of my message across, because I think it’s even more important: title aside, which could be nicer, the article was super bloody amazing: the explanation’s great, and I like the additional pointers, and the humour, and yes, the drawings! Please don’t take any of all that stuff above as a criticism of some sort: I wanted to present a different viewpoint from which the title might read differently than you intended, not that the article is bad. It’s not!)

                                  1. 15

                                    How do you know that you’re punching up?

                                    What if the person encountering your blog is a programmer from an underrepresented background, just barely overcoming imposter syndrome, and now here’s this scary suggestion that they don’t understand hash functions? What if they actually made one of the mistakes in the article, and feel like they’re a complete fraud, and should leave the industry? This is the sort of microaggression/toxicity that everybody talks so much about, if I’m not mistaken.

                                    The point is: you don’t know. You can’t know.

                                    So, err on the side of not adding more negative shit to the world accidentally in the name of pageviews–especially when there are many, many other more positive options in easy reach.

                                    EDIT:

                                    I wouldn’t care if it weren’t for the fact that you’re a smart dude and clearly passionate about your work and that you have good knowledge to share, and that it pains me to see somebody making mistakes I’ve made in the past.

                                    1. 8

                                      I wouldn’t care if it weren’t for the fact that you’re a smart dude and clearly passionate about your work

                                      I’m neither of those things :P

                                      and that you have good knowledge to share, and that it pains me to see somebody making mistakes I’ve made in the past.

                                      I appreciate your compassion on this subject. It’s definitely new territory for me (since forever I’ve been in the “boring headline out of clickbait adversion” territory).

                                      1. 9

                                        Do you actually not see a difference between saying a slightly negative thing about people of a certain profession and how they engage in that profession, and an ad-hominem using sexual orientation? What a weird and bad analogy?

                                        I’m trying to assume good intent here but all your comments make it sound like you’re annoyed at the furry pics and awkwardly trying to use cancel culture to lash out the author.

                                        1. 7

                                          Neither the label of programmers (with which I identify) nor of gay furries (with which the author identifies, according to their writing) is being misapplied. I’m sorry you feel that a plain statement of fact is somehow derogatory–there is nothing wrong with being a proud programmer or a proud gay furry.

                                          My point in giving that example was to critique the used construction of “ is ”. I picked that label because the author identified with it, and I picked the “bad at blogging” because it’s pretty obviously incorrect in its bluntness. If I had picked “lobsters” or “internet randos” the conjured association for the person I was in discussion with may not have had the same impact it that “programmers” had on me, so I went with what seemed reasonable.

                                          1. 4

                                            What do you gain by emphasizing soatok’s sexual identity, other than this morass of objections?

                                          2. 5

                                            I’m trying to assume good intent here

                                            that’s exactly what friendlysock is hoping for

                                            1. 5

                                              you’re right but it’s best not to feed them

                                            2. 8

                                              What if the person encountering your blog is a programmer from an underrepresented background, just barely overcoming imposter syndrome, and now here’s this scary suggestion that they don’t understand hash functions?

                                              Or they may read this and think ‘I’m glad it’s not just me!’. As a programmer who probably has a better than average understanding of hash functions, I don’t feel demeaned by this generalisation, if I were worried about my level of understanding I’d feel comforted by the idea that I wasn’t in a minority in my lack of understanding.

                                              What if they actually made one of the mistakes in the article, and feel like they’re a complete fraud, and should leave the industry?

                                              Or they may feel better that this mistake is so common that someone writes about it on a list of mistakes programmers make.

                                              1. 1

                                                What if the person encountering your blog is a programmer from an underrepresented background….

                                                While I said you’re picking a fight (and would add: “look at the thread, it’s a fight”), I see what you’re saying in this paragraph. I also value non-judgmental explanations.

                                            3. 6

                                              My problem with the title isn’t that it’s insulting, but that it’s inaccurate. Clearly some programmers do understand hash functions, even if other programmers do not. If nothing else, @soatok, a programmer, presumably understands hash functions, or why else would he write a blog post purporting to explain the right way to use them?

                                              Programmers don’t understand hash functions, and I can demonstrate this to most of the people that will read this with a single observation:

                                              When you saw the words “hash function” in the title, you might have assumed this was going to be a blog post about password storage.

                                              Specifically is wrong, at least about me, and almost certainly among other programmers as well. I don’t claim to have deep knowledge about cryptography, and I do expect that there’s probably something I could learn from this blog post, which I will read more carefully when I have a chance. But I am aware that the computer science concept of hash functions is useful for a variety of programming problems, and not just storing password-related data.

                                        2. 4

                                          Funny, I was expecting to read about all the subject he listed instead of password storage.

                                          1. 4

                                            anything can be a hash function if you’re brave enough

                                            This made me laugh audibly.

                                            Perceptual hashes of CSAM do not provide collision or preimage resistance, and it would be possible to flood Apple with false positives if a hash of such material were to ever leak publicly. (Maybe an enterprising Internet Troll will one day make a meme generator that does this?)

                                            That or an actual purveyor of such material will do so, and spread it so far and wide as to render the whole scheme too noisy to be relied upon.

                                            1. 2

                                              I might be missing something. I don’t get a good sense of what a perceptual hash is from this post. edited: This led me to: https://en.wikipedia.org/wiki/Locality-sensitive_hashing

                                              TIL something new!

                                              While I’ve read about “Encrypt then MAC” before, it was never intuitive to me why this is important. I can understand why it 1) can’t hurt; 2) ensures the ciphertext and mac are consistent; 3) avoids leaking data about the plaintext. I’m not sure if it adds any other value.

                                              Wikipedia says “In information security, message authentication or data origin authentication is a property that a message has not been modified while in transit (data integrity) and that the receiving party can verify the source of the message.[1] Message authentication does not necessarily include the property of non-repudiation.[2][3]”

                                              But if a MIM attack is possible (https://tonyarcieri.com/all-the-crypto-code-youve-ever-written-is-probably-broken), then it seems the MIM can also create a legitimate MAC. So I’m confused.

                                              1. 1

                                                I love the art!

                                                1. 1

                                                  I enjoyed this, it’s a great overview and I got a lot of utility out of where this post links to for deeper reading.

                                                  1. -2

                                                    What kind of asinine title is that?