1. 23
    1. 5

      Hi! soatok!

      As always, great post!

      I’m one of the people involved in DSSE, so I wanted to share a little bit more about the rationale:

      1. It’s been incredibly hard to avoid people from actually designing things using JOSE, yet I’m strongly against such a footgun…
      2. One of the reasons for people to not move into PASETO seems PAE itself, it requiring dealing with binary data and bitflipping here and there.
      3. Ironically, it appears that a big driver to not adopt either PASETO (or DSSE for that matter) is that it hasn’t been “blessed” by the IETF…

      I wonder what your take is about these things.

      ETA: we do call DSSE Dizzy ourselves :)

      1. 6
        1. I didn’t even include JOSE in the article because I don’t want to accidentally lend credibility to it. If I see JOSE in an engagement, I brace for the worst. If I see DSSE, I calmly proceed.
        2. I find that surprising, but good to know.
        3. This doesn’t surprise me at all.

        My remarks about DSSE leaving me dizzy were mostly seeing “Why not PASETO? Too opinionated” then “Why PAE? It’s good enough and well documented” but then not using PAE (which, IIRC, was a PASETO acronym). It’s not that you’re wrong, just that it’s confusing. I think something important got lost in the editorial process, but still exists inside the designers’ heads.

        The only thing that I really dislike about DSSE is that you support, but never authenticate, some of your AAD.

        Specifically KEYID. I understand the intent here (it’s spelled out clearly in the docs), but even if it’s never meant to be used for any sort of security consideration, the fact that you’re giving any flex at all over what key goes into envelope verification–but never requiring users to commit that value to the signature–seems like a miss to me. PASETO has uenncrypted footers, but it’s still used in the MAC/signature calculation.

        Any attack based on swapping between multiple valid keys becomes significantly easier if the identifier for said key is never committed. The README remark about exclusive ownership seems to hint awareness of this concern, but maybe the dots hadn’t been connected?

        Having some mechanism of committing the signatures on the envelope to a given signature algorithm and/or public key seems like a good way to mitigate. You can include this in the signature calculation without storing it in the envelope, by the way.

        Sophie Schmieg is fond of opining that (paraphrasing) cryptography keys aren’t merely byte strings, they’re byte strings plus configuration.

        RSASSA-PSS with e=65537, MGF1+SHA256 and SHA256 is a very specific configuration for RSA. If I yeet a PEM-encoded RSA public key at you (which contains only (n, e) in its contents), what’s stopping me from using PKCS#1 v1.5?

        Same thing with ECDSA with named curves and not reimplementing CVE-2020-0601.

        None of what I said is really a vulnerability with DSSE, necessarily, but leaves room for things to go wrong.

        Thus, if I were designing DSSE-v2, I’d make the following changes:

        1. Always include KEYID in the tag calculation, and if it’s not there, include a 0 length. It’s a very cheap change to the protocol.
        2. Include some representation of the public key (bytes + algorithm specifics) in the signature calculation. I wouldn’t store it in the envelope though (that might invite folks to parse it from the message).

        This is a small tweak to what DSSE-v1 does, but it will provide insurance against implementation failure (provided a collision-resistant hash function is being consistently used).

        ETA: Her exact words were “A key should always be considered to be the raw key material alongside its parameter choices.”

        1. 5

          I didn’t even include JOSE in the article because I don’t want to accidentally lend credibility to it. If I see JOSE in an engagement, I brace for the worst. If I see DSSE, I calmly proceed.

          Haha, I imagined as such! and I’l take it that as a cautious compliment.

          I find that surprising, but good to know.

          Yes, I personally don’t think that’s an end-all-be-all rationale, but you see how industry can be very capricious about these things…

          This doesn’t surprise me at all.

          Likewise, but it is rather frustrating to see how many admittedly bad cryptographic systems have been designed and endorsed this way (RFC4880 and the JOSE suite to name a few). I wonder what’s a way to move forward in this department (one would be to maybe beg scott for half a decade spent in IETF meetings? :P)

          My remarks about DSSE leaving me dizzy were mostly seeing “Why not PASETO? Too opinionated” then “Why PAE? It’s good enough and well documented” but then not using PAE (which, IIRC, was a PASETO acronym). It’s not that you’re wrong, just that it’s confusing. I think something important got lost in the editorial process, but still exists inside the designers’ heads.

          Fair enough! I think we are due a complete review of what we wrote in there. The very early implementations of DSSE were PASETO’s PAE verbatim….

          The only thing that I really dislike about DSSE is that you support, but never authenticate, some of your AAD.

          Specifically KEYID. I understand the intent here (it’s spelled out clearly in the docs), but even if it’s never meant to be used for any sort of security consideration, the fact that you’re giving any flex at all over what key goes into envelope verification–but never requiring users to commit that value to the signature–seems like a miss to me. PASETO has uenncrypted footers, but it’s still used in the MAC/signature calculation.

          Most definitely, this is something that we wanted to deal with in a separate layer (that’s why the payload fields are so minimal). This separate layer being in-toto layout fields and TUF metadata headers. I’m still wary of this fact though, and I’d love to discuss more.

          Any attack based on swapping between multiple valid keys becomes significantly easier if the identifier for said key is never committed. The README remark about exclusive ownership seems to hint awareness of this concern, but maybe the dots hadn’t been connected?

          Agreed, this is something we spent some time thinking hard about, and although I don’t think I can confidently say “we have an absolute answer to this” it appears to me that verifying these fields on a separate layer may indeed avoid EO/DSKS-style sttacks…

          Having some mechanism of committing the signatures on the envelope to a given signature algorithm and/or public key seems like a good way to mitigate. You can include this in the signature calculation without storing it in the envelope, by the way.

          Absolutely! A missing piece here is that in TUF/in-toto we store the algorithm on a separate payload that contains the public keys (e.g., imagine them as parent certificates). This is something that we changed on both systems after a security review from Cure53 many-a-years ago (mostly, to avoid attacker-controlled crypto-parameter fields like in JWT).

          Sophie Schmieg is fond of opining that (paraphrasing) cryptography keys aren’t merely byte strings, they’re byte strings plus configuration.

          Hard agree!

          RSASSA-PSS with e=65537, MGF1+SHA256 and SHA256 is a very specific configuration for RSA. If I yeet a PEM-encoded RSA public key at you (which contains only (n, e) in its contents), what’s stopping me from using PKCS#1 v1.5?

          Exactly, we have seen this happen on and on, even in supposedly standardized algorithms (like you point out with 2020-0601 down below).

          Same thing with ECDSA with named curves and not reimplementing CVE-2020-0601.

          None of what I said is really a vulnerability with DSSE, necessarily, but leaves room for things to go wrong.

          Absolutely, and part of me wonders how this plays in the “generalization” of the protocol would fare without all the implicit assumptions I outlined above. FWIW, I’d definitely give PASETO first-class consideration in any new system of mine.

          Thus, if I were designing DSSE-v2, I’d make the following changes:

          Always include KEYID in the tag calculation, and if it’s not there, include a 0 length. It’s a very cheap change to the protocol.

          Definitely, duly noted, and I wonder how hard it’d be to actually make it in V1

          Include some representation of the public key (bytes + algorithm specifics) in the signature calculation. I wouldn’t store it in the envelope though (that might invite folks to parse it from the message).

          This may be a little bit more contentious, considering what I said above, but I do see the value in avoiding dependencies between layers. I’d also be less concerned about fixing something twice in both places…

          This is a small tweak to what DSSE-v1 does, but it will provide insurance against implementation failure (provided a collision-resistant hash function is being consistently used).

          Yup! then again I wonder what the delta between PASETO and this would be afterwards :) (modulo encryption, that is)

          Lastly, I wanted to commend you (again) for your writing! I love your blog and how accessible it is to people through all ranges of crypto/security expertise!

      2. 1

        To avoid dealing with binary, why not just prepend the decimal length of data, followed by a colon? I think this approach originated with djb’s netstrings, and it was also adopted by Rivest’s canonical S-expressions.

        It turns foo into 3:foo and concatenates bar, baz and quux into 3:bar3:baz4:quux. Easy to emit, easy to ingest.

        Add on parentheses for grouping, and you have a general-purpose representation for hierarchical data …

    2. 3

      Wonderful post!

      I’ve heard about some places making edicts about config files needing to use properly-terminated formats like JSON a instead of something like YAML to avoid truncated documents being valid. Would that also be a consideration useful in canonicalization?

      1. 3

        YAML in general worries me (especially the Norway Problem), and its susceptibility to truncation is noteworthy, but this problem is strictly about how you feed data into your MAC (or equivalent) function rather than a general problem with data truncation.

        I’m sure there are other, cleverer attacks possible than the simple one I highlighted.

        1. 2

          OK yeah I’ve properly woken up now, the issue with scooting data from the encrypted to additional stuff doesn’t get magically fixed if you bound that data. Thanks for humoring me :)

      2. 1

        a simple way to handle this truncation issue you raised is to ensure that the data being hashed ends up with a 0x0 byte (which cannot occur anywhere within the string, so does not need escaped). Then the format itself (JSON, YAML etc.) does not matter.

    3. 3

      As far as impact is concerned, does this also affect something like JWT’s HS256 algorithm which calculates the HMAC over the header+payload?

      1. 2

        Looking at a PHP implementation here and here

        • The number of pieces passed into the hash function is constant (2)
        • Each piece is encoded (with base64url) and joined with a separator character (.) that isn’t in the same input domain as the encoded pieces.

        They manage to narrowly avoid it in practice due to the fact that there is no input to urlsafeB64Encode() that creates the separator character. HMAC is also length-extension resistant, so that rules out that class of technique.

        (There is the additional challenge of moving the concatenation point for two JSON objects such that the parser still returns objects, if you’re trying to do more than cause an exception after the payload has passed its HMAC validation. I haven’t explored that problem in detail.)

    4. 2

      Another example I like is when hashing trees https://en.wikipedia.org/wiki/Merkle_tree

      If you look down to where they describe “One simple fix” using 0x00 and 0x01 bytes to signal branch vs leaf.

      At its root they key thing is that you need an encoding function to be injective. Then encode(A) = encode(B) can only ever happen if A = B.

      This can be done by escaping e.g. you insert backslashes, but this is not good in a crypto context because it involves processing the input data (which could leak into a timing side channel). This is why the encoding method is done by prefixing length numbers instead.