1. 34
  1.  

  2. 11

    I wonder what people with JSON parsing problem are parsing. I had an issue with slow json, though not terribly slow (using it to serialize my daily emails, for scale), and the first thing I did was replace it with a better format. Way faster. And I spent much less time developing the solution than finding loops to unroll.

    1. 5

      Deadline auctioning often is a use case. Though most of the advertising industry is awful and works to ~100ms auctioning, a full feed is in excess of 100k requests per second…that’s an awful lot of JSON :-)

      Some people get ‘excited’ then about protobuf, but fail to do sensible benchmarking (finding it is not really interesting faster), or pay attention to that there is effectively only a single C++ implementation.

      1. 6

        Some people get ‘excited’ then about protobuf, but fail to do sensible benchmarking (finding it is not really interesting faster), or pay attention to that there is effectively only a single C++ implementation.

        I often wonder why people aren’t more excited about msgpack, which has the allure of JSON’s adhoc design, but in a compact, binary packing. But, I don’t see the adoption of it…

        1. 7

          If I’m at working at that level, I tend to prefer CBOR, if only for its accepted RFC and being forged from the CoAP project.

          1. 13

            CBOR is basically a (somewhat hostile) fork of MessagePack. The author (Carsten BORmann…) appeared someday in the MessagePack GitHub issues saying he was going to submit a “slightly modified” version of a draft of the MessagePack V3 specification to the IETF under the name BinaryPack. The MessagePack community basically did not agree because there was no consensus on the spec, especially with regards to retrocompatibility.

            There were long discussions (see https://github.com/msgpack/msgpack/issues/121, https://github.com/msgpack/msgpack/issues/128 and https://github.com/msgpack/msgpack/issues/129) which led to the (current) MessagePack V5 spec, from the original MessagePack author. In parallel, @cabo went to the IETF alone with his spec that most of the community did not support, renamed CBOR. He got it accepted because he was already well known at the IETF.

            Here are a few extra links:

            Sadly (IMO), people appear to be convinced by the IETF stamp and implement CBOR instead of MessagePack those days, most without knowing the backstory. In any case, technically, there’s no difference relevant to most of the users between MessagePack V5 and CBOR.

            EDIT: Well, my sentence above is not totally fair, to be honest. There are differences, because CBOR is not really BinaryPack.

            One is indefinite-length items in CBOR. The idea is: instead of specifying the length of an object at its head, you use a terminator (like a C string). You may or may not thing this is a good idea (it makes things easier at encoding time, but more annoying for decoders).

            Another difference is tagged items, which allow the composition of types not specified in the spec out of any basic type, whereas MessagePack supports extensions but require them to be represented in binary. But for JSON compatibilty, you probably don’t want to use non-basic types anyway.

            Finally, there’s CDDL (https://tools.ietf.org/html/draft-greevenbosch-appsawg-cbor-cddl-10), which is a schema language for CBOR, but if you’re going to use schemas why not just go with Protocol Buffers?

            1. 2

              Interesting! I wasn’t aware of the history between the two. Thanks for summing it up.

              1. 2

                Having implemented CBOR for Lua (along with nearly all registered semantic tags) I took a look at MessagePack V5. CBOR is more consistent with its encoding scheme (a string in CBOR starts with 0x60 to 0x7F with a possible length of 2^64; in MP v5 it’s 0xA0 to 0xBF, 0xD9, 0xDA or 0xDB with a possible length of 2^32). CBOR semantic tags are way open ended (2^64 possible values) and while you could argue about their inclusion (self-describing vs. a schema) they do fill a need (this string is a date in ISO format for instance). MP v5’s extensions seem limited in nature—only 127 values possible.

                I also didn’t find the streaming nature of CBOR to be that difficult to handle on the decoding side (now handling circular references? That was interesting). But yes, for a consumer using a pre-built library for CBOR or MP, there probably isn’t much of a difference (semantic tagging and sizes notwithstanding).

                I personally find CDDL interesting, as not everyone wants to be beholden to the whims of Google.

              2. 3

                There’s been a fair amount of work to hopefully replace the default Haskell binary package with one based on CBOR, which promises some nice improvements in encoding and decoding performance as well as message size (and is an interesting contrast to the soon to be released compact regions support in GHC 8.2, which allows essentially free serialisation of Haskell types by sending the program’s in memory representation directly for zero serialisation cost at the expense of size).

                1. 1

                  Oh nice! I wasn’t aware of CBOR! I’ll definitely look into it!

                2. 2

                  Well the excitement is extinguished pretty fast when you find the implementation libraries from the official upstream cannot decode one anothers output and that, despite what it says on the website, it is neither fast nor particularly​ compact.

                3. 1

                  fail to do sensible benchmarking (finding it is not really interesting faster)

                  We found it significantly faster in a couple of use cases.

                  there is effectively only a single C++ implementation.

                  Not sure what you mean here, we were easily able to use it across a couple of languages (thought mostly Java to Java).

                  1. 1

                    there is effectively only a single C++ implementation.

                    Not sure what you mean here, we were easily able to use it across a couple of languages (thought mostly Java to Java).

                    Yes, but with protobuf everything outside uses the same C++ library[1] through bindings under the hood. You may as well just ship directly the internal data representation and ship your own bindings for when you want to jump between languages.

                    JSON and ASN.1 are far better options here as a serialiser, if you are concerned about speed and/or portability.

                    [1] I vaguely recall there is a native 100% Java version of the encoder/decoder library too from Google but that’s pretty much it.

                    1. 1

                      Not true. The Go protobuf library is native.

                      1. 1

                        JSON and ASN.1 are far better options here as a serialiser, if you are concerned about speed and/or portability.

                        I’m confused by this recommendation, given that benchmarks (along with other requirements) should be guiding the decision. The Java benchmarks that I looked at all show the protobuf (v3) handily beat JSON. And my limited usage didn’t show any issues with portability.

                        1. 1

                          The message from the article underlines that it is not the parsing of JSON that is slower than protobuf, but the implementation you are using.

                    2. 1

                      ah, ok, I figured it was some external source one might not control, but couldn’t think of an example.

                  2. 2

                    Should really add a BSON content-type.