1. 20
  1. 6

    XChaCha20-Poly1305 is a variant of XSalsa20-Poly1305 (as used in libsodium) and the IETF’s ChaCha20-Poly1305 construction. It features 192-bit nonces and 32-bit internal counters.

    Actually, the internal counter is 64 bits. That’s how I implemented it in Monocypher, and I think Libsodium did the same (the prototype of crypto_stream_xchacha20_xor_ic() accepts a 64-bit counter). In my opinion, any library that limits itself to a 32-bit counter is flawed. The 64-bit counter is even compatible with the IETF modification of the original chacha20, as long as you don’t overflow the 32 bits of the smaller counter.

    This doesn’t change the maximum message length,

    With a 64-bit counter, it does. You can process 2^64 blocks with a single key. Messages can be 2^70-64 bytes long. This is virtually unlimited.


    So not only can you encrypt forever, each message can be as long as you want. This is how DJB designed both Salsa and Chacha, and this is how it should be. I understand the IETF had reasons to sacrifice the counter (and therefore message length) to get a slightly longer nonce. That lets you send a limited number of messages with a random nonce, each of a limited length, with minimal overhead.

    The nonce/counter tradeoff however disappears completely once you accept some overhead and take the X route. Processing one extra block gives you 256 bits for your nonce and counter. Once you used up 192 bits for the nonce, you have 64 bits left for the counter, it would be stupid not to take full advantage of them.

    My advice to implementers: don’t implement the IETF version of Chacha20. Implement the original one, as designed by DJB. It’s the one and true design. If you need random nonces, use XChacha20. If you really need the IETF variant (compatibility, special constraints…), implement it on top of the original one. You can even do that using the public interface only.

    1. 3

      I was going on the IETF draft for XChaPoly, which uses the IETF’s ChaPoly. Your implementation is probably not compatible with that.

      1. 3

        I was going on the IETF draft for XChaPoly, which uses the IETF’s ChaPoly.

        I suspected as much. There is no legitimate reason to limit the counter to 32 bits, and this draft shouldn’t either. I’ll raise the issue in the CFRG mailing list.

        Your implementation is probably not compatible with that.

        My extensive test suite, Libsodium’s source code, the audit last summer, and my theoretical understanding of the issue, all suggest that it is compatible.

        This works thanks to the placement of the nonce and counter in the Chacha block (c is the constant, k is the key, i is the counter, n is the nonce):

          DJB       IETF
        c c c c    c c c c
        k k k k    k k k k
        k k k k    k k k k
        i i n n    i n n n
        

        The counter is encoded in little endian, so in both implementations, incrementing it means incrementing the bottom left word. The only difference is what happens when we overflow the 32-bit counter:

        • An IETF based implementation would just wrap the counter around.
        • A DJB based implementation would increment the next word as well.

        In the case IETF Chacha20 with the 96-bit nonce, we would be incrementing the nonce. This might cause some problems, but we’re not supposed to overflow the 32-bit counter to begin with. Unless maybe someone starts the counter at 2^32 - 5 for some reason. That never happens in practice.

        In the case of XChacha20, the nonce starts at the third word from the bottom, not the second. The IETF draft agrees with this by explicitly setting the first 4 bytes of the nonce to zero. In this case, incrementing the nonce will never cause a collision, since the corresponding words will always start at zero.

        1. 1

          I’ll raise the issue in the CFRG mailing list.

          The issue is now raised.

    2. 4

      Typo in the value given for the birthday bound? Article says 2**sqrt(n) but I think it’s 2**(n/2) which happens to be equal to sqrt(2**n).

      1. 4

        Oops. Yes, that’s correct. I mixed up my notation.

      2. 3

        Great writeup! Thank you.

        Tangential question that I don’t understand: could HMACC+CTR encrypt-then-MAC be treated as an AEAD? If you have a scheme where you feed (length of associated data, associated data, length of ciphertext, ciphertext) into the HMAC then you get the usual AD feature?

        Part of the reason for wondering about this is that it sounds like some of the aead modes (esp ghash) have quite spooky concerns about fairly easy mistakes with nonces. Whereas, HMAC is relatively hard to mess up AIUI?

        1. 3

          Yes! Added bonus: It’s message-committing, so you can’t do random key replacements or partitioning oracle attacks.

          As far as I’m aware, any {stream cipher, symmetric MAC} can be turned into AEAD if you formalize a sound schema for canonicalizing the data you feed into the MAC. A simple HMAC(aad || ciphertext) wouldn’t work because you can trick it into signing invalid ciphertexts if prefixed by valid AAD). What you said above would work well.

          The other thing is, developers might want support for streaming modes where you don’t have all the encrypted data up front, so a possibly more palatable HMAC schema might look like HMAC(aad || ciphertext || len(aad) || len(ciphertext)).

          (Then again, streaming decryption is also challenging to get right. RUP, TOCTOU, etc.)

          1. 2

            HMAC(aad || ciphertext || len(aad) || len(ciphertext))

            Ah! For some reason it did not occur to me that suffixing with the lengths is equally as good a canonicalisation as prefixing with the lengths.

            Then again, streaming decryption is also challenging to get right. RUP, TOCTOU, etc

            It strikes me as a “just say no” feature. I’m thinking of those bugs where gnupg would emit plaintext from incorrectly signed messages, which software downstream of it believes had been validated. Encrypting packets of data that are individually signed is much safer, right?

            Still subject to the inherent constraint that, if you emit any plaintext to the consumer early, an adversary can always force you into processing only a prefix of the stream by blocking all messages after a certain point in time.

        2. 2

          Thanks! I recently encountered something like this. Chrome is trialing WebRTC “insertable streams”, which means we can do end-to-end encryption of WebRTC also when there is an SFU in the mix (for peer-2-peer this obviously less interesting).

          Browsers also have the Web Crypto API, so the building blocks to make end-to-end are in place. However key exchange and importantly, key rotation, is up to whoever combines these APIs.

          My takeaway from your post is that both for number of messages and for total message length, we should rotate keys before 2^30 (only applies using one of the algorithms in your post). I think we do it more often than that, but I I’m after an easy number I could remember for the future too.

          1. 2

            Apart from the super interesting topic (I really need to refresh lots of stuff before I go on reading) I want to congratulate the author for the great font and layout choices! This is such a pleasure for the eyes!