1. 5
    1. 3

      Stream ciphers that use cryptographic hashes have a risk of running into cycles. This is when calling the transform function on the state will eventually go back to a previous one. In order to mitigate this, we can use a block cipher instead.

      The “proper” way is just to use a bigger state. The usual rule of thumb for overcoming the birthday paradox is to double the number of bits, so I would expect 256 bits of capacity would be sufficient for a 128-bit security level. But then you can’t use raw MD5 as your sponge function, which sort of defeats the author’s purpose of “can we make MD5 do everything”.

      The author mentions ChaCha20 as an alternative permutation. I’ve wondered before how dangerous it would be to use that as a sponge function, because it’s relatively simple and has a nice state size of 512 bits: split it in half, and you can absorb/squeeze 32 bytes at a time, with the other 256 bits being your capacity. Being no more than an armchair cryptographer, I’m vaguely aware that you can’t just turn any permutation into a sponge and call it good – but I’m not qualified to say what a permutation needs to have in order to make a good sponge.

      I see some rough edges in the article (most glaring one is that the described block cipher isn’t really a block cipher). But I’m really happy to see people learning about new fields and “shaking their sillies out” with respect to cryptography.

      1. 1

        I believe Blake2 uses ChaCha20 as the underlying primitive, so investigating how that team built the algorithm would probably answer your question about how to make it safe as a sponge.

      2. 1

        Ah, I found someone to ask questions to. Let me bother you a little now that I’ve caught you.

        I think when performance is not a top priority, you can just use 1 byte or even one bit to absorb and keep the rest as your capacity. This should at least reduce your chances of messing up severely. With ChaCha you can also change the number of rounds to play around with the mixing quality/performance trade-off.

        I’d like to ask you why the thing I called block cipher isn’t really a block cipher. I’d love to improve my terminology and correct any mistakes in my mental model if you have time to elaborate.

        1. 1

          What you have right now is a single function get_block(key, counter) that returns a block of pseudorandom bits. For a block cipher, you need a pair of functions:

          • encrypt(plaintext block, key) that returns the encrypted block
          • decrypt(encrypted block, key) that returns the plaintext block

          What you have mimics a block cipher used in CTR (counter) mode, which is a commonly used mode. But block ciphers can be used in other modes as well – CBC is a common one.