Ignoring the details, the issue is that the {MD5,SHA-1,SHA-2} hash output is the entire internal state. Thus, you can reuse that internal state output to continue hashing something else, even if the original hash was initially produced using some secret key.
One way of avoiding this is truncating the hash—you no longer get the whole internal state. Another is to have some sort of out-of-band signalling bit in the compression function indicating the last compression, so that it is distinct from the same compression in the case where there’s more message incoming.
Ignoring the details, the issue is that the {MD5,SHA-1,SHA-2} hash output is the entire internal state. Thus, you can reuse that internal state output to continue hashing something else, even if the original hash was initially produced using some secret key.
The key to understanding the practical use is that because the interpretation of the padding is only done on the final block, if that padded block is no longer the final block (which is possible, as @pbsd mentioned, because the output is merely the internal state produced after digesting the prior blocks), then the padding is now just indistinguishable from any set of arbitrary bytes that really were part of the message. Towards the end a practical example is given using a formencoded-style record format:
A bunch of “junk bytes” are added here, but theoretically a parser of this format would take the last value in the string, receiving the malicious but valid admin value for role.
HMAC doesn’t have this issue because its construction does not feed the message directly as the payload to the final hash function; instead, the message is appended to secret key, similarly to here, but that digest is then appended to the secret key and a second round of hashing is used on this (the secret key is XORed with different fixed pads each time to ensure that they don’t initialise the hash function with the same initial state).
That way, assuming the security of the underlying hash function, you can’t produce valid MACs by length extension. Like with the other suggested workarounds, the digest is no longer the result of directly digesting the underlying message, and it’s trivially provable, because the construction of HMAC in this way means that all valid HMAC signatures have payloads of a fixed length in the final hash pass, so no amount of length extension will be more effective than simple brute force collision.
I wish he’d explained how the attack works. Is it a mistake in the padding algorithm?
Ignoring the details, the issue is that the {MD5,SHA-1,SHA-2} hash output is the entire internal state. Thus, you can reuse that internal state output to continue hashing something else, even if the original hash was initially produced using some secret key.
One way of avoiding this is truncating the hash—you no longer get the whole internal state. Another is to have some sort of out-of-band signalling bit in the compression function indicating the last compression, so that it is distinct from the same compression in the case where there’s more message incoming.
The key to understanding the practical use is that because the interpretation of the padding is only done on the final block, if that padded block is no longer the final block (which is possible, as @pbsd mentioned, because the output is merely the internal state produced after digesting the prior blocks), then the padding is now just indistinguishable from any set of arbitrary bytes that really were part of the message. Towards the end a practical example is given using a formencoded-style record format:
A bunch of “junk bytes” are added here, but theoretically a parser of this format would take the last value in the string, receiving the malicious but valid
admin
value for role.HMAC doesn’t have this issue because its construction does not feed the message directly as the payload to the final hash function; instead, the message is appended to secret key, similarly to here, but that digest is then appended to the secret key and a second round of hashing is used on this (the secret key is XORed with different fixed pads each time to ensure that they don’t initialise the hash function with the same initial state).
That way, assuming the security of the underlying hash function, you can’t produce valid MACs by length extension. Like with the other suggested workarounds, the digest is no longer the result of directly digesting the underlying message, and it’s trivially provable, because the construction of HMAC in this way means that all valid HMAC signatures have payloads of a fixed length in the final hash pass, so no amount of length extension will be more effective than simple brute force collision.
This is a property of the sha1 and sha2 family of hash functions. To avoid this, it is recommended to use SHA3 or blake for example.