1. 16

This is the official advisory on the long story that started with WebKit breaking their SVN repository.

After a long discussion and several iterations of fixes to the code, it was decided that outright rejecting such content is the only reasonable option.

I am mainly linking this to make the youngsters among us aware that decisions they make today will still carry relevance 10 years from now. Choose your hashes and your designs carefully!

Oh, and if you do happen to be responsible for an SVN server, updates should be out next week.


  2. 10

    Choose your hashes and your designs carefully!

    Isn’t the moral here that you should try to plan for hash alg changes? Because “Choose your hashes” is just hindsight, really. When SVN was designed, SHA1 was probably still “safe”, right?

    1. 13

      Plan for changes. And then start making them. SVN isn’t alone here, but there was a solid ten year lead time between “SHA1 can have collisions” and “I told you so”. The state of RC4 is somewhat similar, with people refusing to move because it wasn’t broken enough. (And a lack of clear direction forward in some cases.)

      Google went to considerable effort to create a collision. If they hadn’t, people would still say it’s only a theoretical concern. Be thankful it’s still just a warning shot.

      1. 2

        There was a submission a long while ago talking about various strange behaviors of crypto material. One was that there was an algorithm that used as a hash, two hash functions and XORed the results together to create the final hash. This allowed them to survive the deprecation of MD5, though they never moved away from MD5. I wonder how useful it is to do this as a way of transitioning away from aging weakly vulnerable hashes.

        1. 3

          XOR is worse than concatenation for combining hashes if you’re looking for collision resistance. Here’s a paper describing the safest way to combine two hashes: https://eprint.iacr.org/2013/210

          Also, see this answer on crypto StackExchange describing various failures of combined hashes.

          1. 2

            Yes, that is strange. :) I think the crypto community usually frowns on things like that. Consider that at the time you had something better than MD5 to stir into the result, you could have just used that something better. MD5 ^ SHA1 isn’t notably superior to just SHA1. SHA1 ^ SHA2 isn’t really better than SHA2.

            1. 1

              I think the crypto community usually frowns on things like that.

              Is it just “that’s pointless” or could it really hurt? Is it likely or inevitable that the output of two different hash functions on the same input would have coincidental correlations that cancel out with the xor, creating a subtly biased composite function that is worse than the sum of its parts?

              1. 1

                Indeed: It used to be frowned on because people were concerned that there might be some subtle interaction between the algorithms, although I’ve never seen any evidence of such interactions “in the wild”.

                With modern cryptographically strong hash functions, which have a much stronger theoretical base for their security, it’s just a pointless waste of time. I guess you gain a slight ‘security through obscurity’ benefit: an algorithm for generating collisions in SHA-2 (if such a thing were to be discovered) might not work on your custom SHA1^SHA2 implementation, but the same background theory could probably be used to break your implementation - it would just take a little more time.

                1. 1

                  The contrast between crypto and os/application level approach to security is striking.

                  In the former, an algorithm once accepted is assumed secure until someone publishes a paper or poc that demonstrates a weakness. Furthermore it is assumed that these first discoveries are always more theoretical than practical, so there is no “zero day” rush to change algorithms.

                  In the latter case, we assume there are undiscovered bugs that could be weaponized in a short timeframe, and defense in depth is the norm.

                  I do not find it surprising that some developers reach for tricks like mixing the output of two different algorithms. After all, if mixing predictable data (message) with pseudo-random noise (keystream) works for encryption and mixing potentially predictable data (time, io events, etc.) with other such events works for entropy pool mixing, why wouldn’t it work for mixing hashes (which can be thought of as being pseudo-random noise seeded with the key)?

                  If there were no undesirable interactions between two different hash algorithms, intuition says that mixing one with the other is safe as long as one of the algorithms remains secure. And again assuming no interaction, intuition might say that to break the composite, it is inevitable that you break its components..

                2. 1

                  Mostly pointless I believe. There’s a proof that given two 128 bit hashes, the work to find a collision in both is proportional to 2^128 + 2^128, or 2^129, and not the 2^256 you might hope for. But also, any time you color outside the lines, you run the risk of making things worse.

                  1. 1

                    Right, so I would assume the reasoning people have for mixing is not that it doubles the number of bits of search space, but that it saves your ass the day someone finds one of these algorithms is broken and the work to find a collision is proportional to 2^49 or whatever.

                    Making things worse is a thought that eludes these people.

                  2. 1

                    Is it likely or inevitable that the output of two different hash functions on the same input would have coincidental correlations that cancel out with the xor, creating a subtly biased composite function that is worse than the sum of its parts?

                    It’s likely there’ll be nonzero correlation, because hash functions aren’t written in isolation and use similar techniques. But probably not significant enough to make a difference in practice.

              2. 4

                You need to pick a sufficienly strong hash for your application. SHA1 is still in this territory for SVN’s purposes, since collisions are still negligible during typical use as a version control system. Which is great, because otherwise the fix about to be released would break the system for many users.

                SVN”s problems are that apart from discussing the issue years ago nobody bothered to check what actually happens in the implementation when a collision occurs (that’s a process problem), and that we found ourselves incredibly constrained while trying to come up with the best possible fix for the “webkit” problem. Today, we cannot change the hash without breaking important parts of the system or adding (yet more) backwards compat boilerplate code. We must prevent SHA1 collisions from entering the system to prevent (perhaps accidental) DoS attacks on the system. At the core, this is a design problem. Some features have tightly embraced SHA1 and now replacing it involves a lot of work.

                Edit: Another factor that complicated things was that API, protocol, and on-disk format changes are off-limits for SVN’s patch releases, but we had to patch both the 1.9 and 1.8 release series.

                1. 2

                  The iron law of cryptography is that crypto schemes always get weaker over time.

                  But when this is a problem which isn’t going to manifest itself for fifteen years or more, punting it to the long grass is always going to be very tempting!

                  1. 2

                    SHA2 hadn’t even been published when SVN was first released (SVN was released in 2000, SHA2 was first published as a draft in 2001 and finalised in 2002).

                  2. 1

                    New systems should ideally use Multihash