1. 41
  1.  

    1. 9

      One aspect that bothered me was some people still treating this like a theoretical “maybe in the future” problem.

      You’d think 30+ years of people pointing out these issues, getting told “that’s just theoretical”, then it becomes an issue, would have taught those involved? I mean SHA1 for commits is a famous example of this so it’s particularly ironic.

      1. 8

        This framing of the problem is really good: https://valerieaurora.org/hash.html

        I think the lesson is that “in the future” is continuously approaching, and you need to design your system with evolution in mind.

        1. 14

          There are about three issues here, and it’s important to be sure what you are worried about.

          Firstly, Kees is pointing out problems with Linux kernel development tooling that does not preserve a full git hash as a reference to a commit. If a tool truncates a hash then it’s easy to construct a collision. This isn’t a SHA1 problem: this is a truncation length problem. Kees’s collision is AIUI about 48 bits, which is a million times easier than the state of the art in collision attacks. (Or a trillion? Dunno how the birthday paradox applies, but it doesn’t matter much. Let’s assume the worst.)

          Secondly, Valerie was working on hash collisions in zfs 20+ years ago, before there was solid cryptographic understanding of how to make a hash function. Her chart of hash function lifetimes was based on the best knowledge of how cryptanalysis improved through the 1990s up to the big breaks in MD5 and SHA-1 around 2005. Val was very concerned that straightforward hash-based deduplication was unsafe. She was right, because Subversion chose exactly the design Val was worried about, and svn was fucked when SHA-1 was broken a decade later.

          Thirdly, there has been progress. Cryptographers now have a much better understanding of how MD5 failed, how SHA-1 is weak, and why SHA-2 is OK. The current state of the art is that SHA-1 is very hard but not impossible to break, and there is a counter-cryptanalysis defence used by git that we hope can be re-used to turn any future novel very hard attack into a corresponding defence.

          Kees is saying that hash collisions are easy, even if they require hundreds of trillions of attempts. But Kees’s observation says nothing about the security of SHA-1 or git (which, with the defences, require several orders of magnitude more attempts). And Valerie’s analysis 20 years ago is not a good guide to the reliability of cryptographic hash functions today.

          Nowadays I would say, yes you need to design any protocol with evolution in mind, but cryptographic failure is not continuously approaching in the way Valerie illustrated. (Nowadays cryptographers are more worried about quantum attacks on asymmetric cryptography; much less worried about symmetric primitives.)

        2. 3

          You’d think 30+ years of people pointing out these issues, getting told “that’s just theoretical”, then it becomes an issue, would have taught those involved?

          After probably 3000+ years of people not taking possible future problems seriously, I wouldn’t expect people to do otherwise. I guess you’re not as cynical as I am.

        3. 7

          I love a good proof of concept to make the theoretical concrete.

          I remember in the 90s when people started talking about timing side-channel attacks against crypto being theoretically possible, and then before I knew it we had to reimplement every algorithm in constant time because the attacks turned out to be incredibly practical.

          1. 4

            Informally using prefixes of the full hash is always a vastly increased collision risk compared to using the full hash, but is super convenient.

            It can easily be mitigated by requiring that matches are only considered if they are an ancestor of the commit referring to the hash prefix.