1. 19
    1. 9

      Sounds like ESR has never worked with a binary protocol before. Nothing he writes about isn’t true of time tested protocols like DNS or TLS.

      1. 4

        Maybe, in another application with 10^3 more transaction volume per user, or with a 10^3 increase in userbase numbers, we’d incur as much thermodynamic cost as landed on a typical NTP server in 1981, and a packed binary format would make the kind of optimization sense it did then.

        TLS is literally used for streaming video and multi-gigabyte downloads. Every Google search, every Facebook post, every online banking transaction, and almost every email involve at least one TLS tunnel. It probably makes economic sense for TLS to use bit packing under ESR’s economic model.

        I’m not so sure about DNS. It might very well be fine to switch DNS over to something text based (except, of course, that DNSSEC requires there to be a single canonical representation for every DNS message, so JSON itself is right out). OTOH, DNS is part of the critical path for initial web page loads, so we really, really want to keep latency low on it…

        1. 11

          As annoying as packed blobs are, the lower you go in the stack the more you want to cater to the machine–and machines hate JSON. This is also such a niche thing that opening it to the wide masses who can look at JSON doesn’t really win you anything–this isn’t a document format for blogging, this is a core ops mechanism.

        2. 5

          except, of course, that DNSSEC requires there to be a single canonical representation for every DNS message, so JSON itself is right out

          I hate JSON, but it can be made canonical simply by stating that all object properties be sorted. It’s still a terrible notation, of course.

          I’d prefer canonical S-expressions, which are human-readable, elegant & efficient too. As an example, here’s esr’s JSON:

          {"class":"TPV","time":"2010-04-30T11:48:20.10Z","ept":0.005,
                         "lat":46.498204497,"lon":7.568061439,"alt":1327.689,
                          "epx":15.319,"epy":17.054,"epv":124.484,"track":10.3797,
                          "speed":0.091,"climb":-0.085,"eps":34.11,"mode":3}
          

          And here it is as a canonical S-expression:

          (tpv
            (time "2010-04-30T11:48:20.10Z")
            (ept "0.005")
            (lat "46.498204497")
            (lon "7.568061439")
            (alt "1327.689")
            (epx "15.319")
            (epy "17.054")
            (track "10.3797")
            (speed "0.091")
            (climb "-0.085")
            (eps "34.11")
            (mod "3"))
          

          which would actually be (3:tpv(4:time23:2010-04-30T11:48:20.10Z)(3:ept5:0.005)(3:lat12:46.498204497)(3:lon11:7.568061439)(3:alt8:1327.689)(3:epx6:15.319)(3:epy6:17.054)(5:track7:10.3797)(5:speed5:0.091)(5:climb6:-0.085)(3:eps5:34.11)(3:mod1:3)) on the wire. It is two extra characters, but it’s efficient enough, and it doesn’t give one a false sense of security, as JSON does (What precision are those floats? Is that integer actually an integer? Where is the timestamp format specified?).

    2. 12

      I was hoping he’d mention one pitfall of “discoverable” formats, which is enabling developer laziness. When I see the NTP format, I know I’ll need to spend some hours understanding all those cryptic bitfields. With the JSON format, the temptation is strong to just “wing it”. In fact, that’s arguably what “discoverable” equates to — “ability to just wing it”. Which is exactly what happens every day with HTTP and even JSON itself. (Did you know JSON isn’t actually equivalent to JavaScript syntax? Congrats, you must have read the spec.)

      All wire formats have complex semantics, and they aren’t “discoverable”, whether you use binary or Latin.

      When he said the numeric literals in the JSON were a good example of future extensibility, that was frankly hilarious, because I guarantee those will get parsed into arbitrary-length floats, and truncated accordingly. And that implementation is going to work fine as long as “everyone” uses float64s in their JSON parser, but then you’ll see the flag day when “version 2” specifies the resolution of floats, and sorry, you’ll need to read the spec at that point.

      1. 3

        Which is sort of why I like CBOR/YANG combo.

        Yes, it’s discoverable, but you can also automagically pull the YANG schema to find the hard definition of the fields.

        Sort of the best of both worlds….

        The self describing nature of json, with the compression and well definedness of protobufs.

      2. 2

        Did you know JSON isn’t actually equivalent to JavaScript syntax?

        Is this the comments thing?

        All wire formats have complex semantics, and they aren’t “discoverable”, whether you use binary or Latin.

        Bravo

        1. 9

          I shouldn’t have said “equivalent”, because there are deliberately a thousand things that are in JavaScript that aren’t in JSON.

          What’s surprising is that JSON isn’t a subset of JavaScript. Because obviously it was intended to be!

          The reason is U+2028 and U+2029.

          http://timelessrepo.com/json-isnt-a-javascript-subset

          1. 2

            That was the cause of one of the most suprising bugs I’ve tracked down.

            Certain user content was erroring out when it loaded because it included those zero-width whitespace characters.

    3. 5

      Sounds like a lot of the people commenting here missed the point.

      The article is called “How not to design a wire protocol.” It’s not about how good JSON protocols are. It’s about how bad bit-packed binary protocols are. If you’d rather use MessagePack, canonicalized S-Expressions, HL7v2, netstrings, bencode, or MIME headers, that’s fine.

      The point is to avoid bit-packing like NTP uses, because you will eventually hit the limits of your binary encoding and wind up with very annoying modular arithmetic tricks to work around it. Unless the protocol is something like TLS, TCP, QUIC, RTP, Rsync, SMB, Git, HTTP… you know, protocols that actually shovel around gigabytes of data and actually need to be super-optimized, it’s not worth it. And even then, it’s only really the stuff that gets sent on every single frame that actually needs it; most of these protocols have some sort of up-front negotiation step before they switch to using tightly-packed bytes.

      It’s also REALLY IRONIC that there are people here complaining about NTP switching from binary to text, who also were complaining about how awful it was that HTTP switched from text to binary.

      1. 13

        I think my main problem with this is that he is flirting dangerously with equating “over-optimized” with “precisely-specified”. The argument that it’s easier to extend text formats is a little specious—it’s the extensibility that makes them extensible, not the textness. I can add a new header in HTTP by adding CRLF and a new header clause, but that’s because I specified that I could do that, not because of some magic of text formats.

      2. 4

        t’s also REALLY IRONIC that there are people here complaining about NTP switching from binary to text, who also were complaining about how awful it was that HTTP switched from text to binary.

        I wonder if that is true? Are the people who complained about HTTP switching to binary actually the same people who complain about NTP switching to text, or are they different groups of people?

        I’m not saying you’re wrong, I’m just wondering. I feel like I see this kind of sentiment a lot; “reddit/HN/lobste.rs complained when X happened, now they complain when !X happens”, and I feel like it’s often two separate groups of people complaining.

        1. 1

          Well, if you want me naming names, then @friendlysock’s responses have been pretty consistently anti-HTTP2 and anti-HTTP3, though I can’t find any comments where he calls out why in a way that’s more specific than “it benefits Google but doesn’t benefit anyone else”.

          1. 1

            I haven’t gone looking for the specific comments you mean, but I just wanted to preemptively say that I see where you’re coming from, but I do I think people should be allowed to change their mind, or even just be inconsistent by accident.

    4. 3

      And we won’t have 10^3 clock cycles to make user lives better if we waste them on setting the clock.

    5. 2

      I’m not sure if I like the idea of depending on the implicit extensibility of JSON encoding over the explicit extensibility of something like SCTP. How exactly are you supposed to decide if an implementation supports an implicit extension, such as increased time precision or adding a new field? And why couldn’t that be supported in a binary protocol?

      Binary protocols have ossified primarily because middle boxes drop packets they don’t understand, even if the protocol had extensibility mechanisms designed in. Moving to a JSON encoding won’t solve that problem. But it will cost everyone a tremendous amount of overhead.

    6. 2

      “self-describing” and “binary” are two orthogonal concepts.

      Also, “self-describing” is not a boolean thing: you can have protocols that describe their format in every message (e.g. JSON) or less frequently (e.g. at the beginning of a connection).

      Also, protocols like ASN.1 [1] can provide format descriptions and tools that generate parsing code - which is usually way faster than parsing BSON/bencode-like formats because you don’t need to read the whole blob sequentially.

      [1] https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One

      1. 0

        Also, protocols like ASN.1 [1] can provide format descriptions and tools that generate parsing code - which is usually way faster than parsing BSON/bencode-like formats because you don’t need to read the whole blob sequentially.

        In theory. In practice ASN.1 parsers are famously known to be impossible to get right, secure or fast, mostly because of the quite byzantine specifications.

        Something more focused, but still IDL-based, like FlatBuffers can provide the same benefits of ASN.1 with a smaller and simpler (generated) parser.

        1. 1

          ASN.1 parsers are famously known to be impossible to get right

          I agree, that’s why I wrote “protocols like”.

    7. 1

      My never-used-but-current-favorite wire format: UBF. I need to find an excuse to play with it. The first paragraph of Section 5 of http://www.erlang.se/workshop/2002/Armstrong.pdf describes it well. (I’d paste it here, but I can’t get copy/paste to work on this phone right now…)