1. 24
  1.  

  2. 3

    I appreciate the problems/dangers involved in canonicalizing structured data like XML or JSON, but the author’s proposed alternative is pretty awful. The signed content is now no longer part of the document, it’s hidden in a blob that has to be separately parsed before it can be interpreted.

    This may not seem so bad for something like SAML, where the signed content is the whole point so you can just add extra code to do the nested parsing. But it’s a big wrench in the works of any general-purpose system based on JSON or XML, where signed data is just one type of content. For example, a MongoDB or Couchbase or CouchDB database, How are you supposed to query on attributes of the signed content, or expose it through a GraphQL interface?

    I’ve run into this when working on schemas for signed documents in Couchbase. The document JSON has to be left alone; the signature is just a property added to it. This requires that you define a strict algorithm for canonicalizing the document. Fortunately this is much easier to do in JSON than XML!

    The dup-keys vulnerability shown in the article highlights that it’s important to validate the input data. Duplicate keys are not valid JSON, so proper handling would have rejected the malicious SAML data instead of trying to massage it into shape by ignoring one or the other dup.

    1. 6

      Duplicate keys are actually valid JSON, because there’s nothing in the standard that forbids them. Which, IMO, is a flaw in JSON. You shouldn’t have ambiguous cases like that, you should either have a rigidly defined mechanism to pick one or mandate that it be rejected.

      1. 2

        This makes me very sad.

      2. 2

        The dup-keys vulnerability shown in the article highlights that it’s important to validate the input data. Duplicate keys are not valid JSON, so proper handling would have rejected the malicious SAML data instead of trying to massage it into shape by ignoring one or the other dup.

        The author disagrees with you:

        This is because duplicate keys are valid JSON, removed upon processing and most JSON implementations let the last key win.

        1. 2

          Nothing in the JSON specification tells the user to deduplicate fields. It is assumed behaviour as it is how it works in semi-canonical implementation in ECMAScript, but it isn’t something specified anywhere and in theory nothing prevents the implementation to keep both values or to keep only first one.

        2. 1

          If you need to search against signed content, store the unencoded content. That’s easy.

          FWIW, JWT has a lot of flaws but works as the author suggests. It consists of three base64 blobs separated by dots. Blob 1 decodes to the algorithm (which is a design flaw, oh well). Blob 2 is the content. Blob 3 is the signature.