1. 17
  1.  

  2. 23

    This post seems to be proof by ridicule.

    I do not want to load a binary blob as if it were some struct, that’s a good way to crash your program, or expose some vulnerability. I’d much rather parse the binary input to confirm it is what I expect.

    The advantage of a text format is that I can easily edit the input or output with existing tools.

    The advantage of headers is that they are not required for the basic functionally of the request or response.

    1. 15

      I think it’s because we don’t have better tooling. Wireshark makes observing binary protocols much easier and richer than just text, and if we had richer structural editing (think Interlisp-like editing s-exprs in memory) and structures (again, closer to TLV over C structures), we could just make binary encodings for everything reasonable.

      Alas, Unix.

      1. 27

        But text as a common denominator is the reason that there are good tools. Bespoke binary formats introduce a fundamental O(M * N) problem. You have M formats and N operations, and writing M*N tools is infeasible, even for the entire population of programmers in the world.

        For example, I can grep a Python, JavaScript, Markdown, HTML, a log file, etc. How do I grep PDF, Word, or an RTF file? You need a tool for each format.

        I can git pull and git diff any of those files too. How do I diff two Word files? I have to wait for somebody to build it into Word.

        How do I version control a bunch of video clips cut from the same source? You need a custom tool if you want to do anything but copy the whole thing. (Again video editing tools have this knowledge built in, e.g. with an app-specific notion of a “project”)

        How do I diff two Smalltalk images?

        I’m not saying those are necessarily bad designs, but there’s an obvious tradeoff, and it’s why Unix is popular. And it’s especially popular in open source where you can’t just throw highly paid programmers at the problem.

        It is not an absolute; in reality people do try to try to fill out every cell in the O(M * N) grid, and they get partway there, and there are some advantages to that for sure.

        Editing with a human-friendly UI is one of the most expensive “rows” in the grid to fill. Many structural editors exist, but they don’t catch on. They have a fundamental limitation. The problem is: which structure? The lowest common denominator of structural formats doesn’t exist. In other words, the lowest common denominator is an unstructured byte stream.

        IntelliJ is partway there – they have a generic text editor, but structure is laid on top. It’s nontrivial to get this layering right, and you still have a bunch of work to do for each language. Try doing Zig in IntelliJ, etc. I think you will mostly have a text editor (and that’s good).

        Don’t forget that parsing is at least two rows in the grid: parsing for validity/interpretation/compilation, and parsing incomplete code for a rich UI. IntelliJ has its own parsing framework to make these operations easier in a polyglot manner. But it’s expensive to produce and Eclipse wasn’t able to duplicate it in the open source world.

        WireShark supports pretty printing network protocols. What about sed, grep, wc (counting records), diff, an awk of all network protocols? I’m sure it has support for many of those things by now, although it’s not easy to find the abstraction that lets you do it. There’s an inherent tradeoff.

        Some projects addressing the tradeoff:

        (I should probably write a blog post about this)


        Other resources: Always Bet On Text by Graydon Hoare

        Text is the most efficient communication technology

        Text is the most socially useful communication technology

        It can be compared, diffed, clustered, corrected, summarized and filtered algorithmically. It permits multiparty editing. It permits branching conversations, lurking, annotation, quoting, reviewing, summarizing, structured responses, exegesis, even fan fic. The breadth, scale and depth of ways people use text is unmatched by anything. There is no equivalent in any other communication technology for the social, communicative, cognitive and reflective complexity of a library full of books or an internet full of postings. Nothing else comes close.


        A related rant about protobufs from a couple days ago: https://news.ycombinator.com/item?id=25582962

        Protobufs give you structured data – now what’s the problem? Isn’t that what we want?

        No, it’s because simple operations like copy are now not generic. It doesn’t work like Unix cp or memcpy(). You need machine code to implement this one operation for every single type. Tons of complexity results from this (and this is explained by multiple primary authors of protobufs):

        The protobuf core runtime is split into two parts, “lite” and “full”. If you don’t need reflection for your protos, it’s better to use “lite” by using “option optimize_for = LITE_RUNTIME” in your .proto file (https://developers.google.com/protocol-buffers/docs/proto#op…). That will cut out a huge amount of code size from your binary. On the downside, you won’t get functionality that requires reflection, such as text format, JSON, and DebugString().

        Even the lite runtime can get “lighter” if you compile your binary to statically link the runtime and strip unused symbols with -ffunction-sections/-fdata-sections/–gc-sections flags. Some parts of the lite runtime are only needed in unusual situations, like ExtensionSet which is only used if your protos use proto2 extensions (https://developers.google.com/protocol-buffers/docs/proto#ex…). If you avoid these cases, the lite runtime is quite light.

        However, there is also the issue of the generated code size ..

        1. 5

          Another concept to look into is the “thin waist” of networking, of operating systems, and compilers. They are all least common denominators that solve O(M*N) problems:

          • TCP/IP is a thin waist. One on side you have low level networks like Ethernet, Wireless, etc. On the other side you have applications formats like HTTP, e-mail, IRC, etc.
            • It doesn’t make sense to have e-mail work over wired, then have to port it to wireless, etc. In the early days, it DID work like that. You lacked the abstraction of the thin waist. This was the whole point of the Internet – to bridge incompatible networks.
          • Unix is a thin waist. On one side you have heterogeneous hardware (supercomputers, PCs, embedded evices). On the other side you have applications.
          • LLVM is a thin waist. On the one side you have languages, and on the other you have ISAs.
          1. 3

            We should be careful to note that the cost is not O(1), as we might hope, but O(|M| + |N|). We have M parsers, N backends, and a single format that they all use for interchange. This cost shows up in tools like Pandoc and in the design of protocols like NDN which aim to explicitly have a thin-waist model.

          2. 4

            or an RTF file

            bad example, RTF IS a text-based format. https://en.wikipedia.org/wiki/Rich_Text_Format

            And the text parts of pdf and Word documents frequently are actually stored as text too. The problem is grep doesn’t actually work on “text”… try like strings foo.doc | grep ..., decent chance it will actually work. Of course “text” is in scare quotes because you might have to convert from UTF-16 or something too.

            And grep only works if there’s reasonable line breaks which is frequently untrue, grep HTML just returns one gigantic line a lot, ditto on diff. You might have to put some format-specific formatter in the middle of the pipeline to make it actually usable by those other things too.

            The general principle of just sticking another pipe in the middle to convert applies to lots of formats. In my other comment I used disassemblers as an example and they work pretty well.

            Text and binary isn’t really that different.

            1. 1

              That’s all true, but it doesn’t invalidate the main point. Text is a compromise that works with a lot of different tools. You get more reuse.

              strings foo.doc | grep is a good example.

              There are tools that take an HTML or JSON tree structure and make line-based so you grep it, e.g.

              https://github.com/tomnomnom/gron

              These are features, not bugs.

              1. 1

                Adding to your comment; PDF is also a text-based format. New word documents too (OOXML).

              2. 2

                I think a good argument against your O(M * N) problem is that the onus should be the binary format provider’s onus to supply some kind of binary -> text tooling. FlatBuffers, for instance, provides tooling to go to and from json.

                Now, an argument against that is that it’s herding cats to get everyone to actually provide said tooling. Which I guess is why we can’t have nice things.

                1. 1

                  You can also write your own text -> binary conversion, and lots of people have done that for specific use cases.

                  That is an argument FOR text, not against it (but also not against binary). It’s not either/or or absolute like I said in the original comment. Binary formats and text formats are both valid tools, but unstructured text is a useful “thin waist” that reduces engineering effort.

                  Related: https://lobste.rs/s/vl9o4z/case_against_text_protocols#c_jjkocy

                  I recommend reading “The Art of Unix Programming”, there is a section about textual formats for images.

                  Related: Use echo/printf to write images in 5 LoC with zero libraries or headers

                  How to create minimal music with code in any programming language

                  (i.e. use the Unix style. If you’re writing code in a niche language, you don’t need to wait for somebody to write a library.)

                2. 2

                  I have two recurring problems with text diffs:

                  One is when a colleague has changed something in a UI definition and it has been serialised into XML or JSON, and values in a dictionary have Switches places, or a numeric value has been re-rounded, leading to a bigger diff than necessary.

                  The other is when I’ve refactored an if clause into a guard, re-indenting much of a function. The structure has changed, but the diff covers an entire function instead of the structure itself, which is just a few lines.

                  Text is just an approximation of structure.

                  1. 1

                    Yes, that’s valid and true. But it doesn’t contradict anything I said – byte streams are a compromise, a thin waist, just like TCP and LLVM are. TCP is very inefficient for certain applications; likewise LLVM is an awkward fit for many problems. But it’s uneconomical to rewrite the entire ecosystem for a specific application.

                    Though note you can also write your own GIT_EXTERNAL_DIFF for XML and JSON, and I’m sure someone has done so already. Unix is flexible and customizable.

                    1. 1

                      Well, yes… Most programming languages and text formats are liberal when it comes to whitespace and newlines. This is a design choice with trade offs like they one you mention. But it is not always the case. For example, http headers do not let you squeeze in ad hoc whitespace or new lines.

                      Notice that the problem arises from using a specialized UI tool to output text from structure… Then it doesn’t play well with simple text based tools. Using a simple text editor gives you full control of every single char and will not have the problem you describe.

                      This is exactly the same problem with binary formats, because you pretty much have to use a specialized tool.

                    2. 2

                      I realize this isn’t the main thrust of your argument, but Word can definitely diff two Word documents, and act as a merge tool for them. It’s not as straightforward as regular diff, but it’s definitely capable there.

                      1. 1

                        If you have used uchex, how different is that approach from simply using a parser with automatic error correction (something like what ANTLR does)? (I find it strange that they do not mention any standard error recovery except for a passing mention in 2.1).

                        1. 2

                          I haven’t used it, but I think I should have called it “micro-grammars” rather than “uchex” (they use that name in the paper). It’s an approach for partial parsing and semantic analysis that works on multiple languages.

                          Looks like some people have implemented this:

                          https://github.com/atomist/microgrammar

                          I think error recovery is related, but philosophically different? There is no error recovery in microgrammars. They purposely don’t specify some information. They don’t specify ALL of the language and then “recover” from errors. And as I understand they also do semantic analysis. I’d be interested in details from anyone who has more experience.

                          1. 1

                            Wanted to mention that I chanced upon Island Grammars – “Generating Robust Parsers using Island Parsers”. It seems to be very closely related, and actively researched. The basic idea is to specify some parts of the program in detail, while ignoring other parts with very coarse parse rules.

                            1. 1

                              Thanks for the link! much appreciated.

                          2. 1

                            Thanks a bunch for this comment, I’ve personally learned a lot!

                            An interesting thin waist solution for binary formats would be having canonical text representation. WASM does this right I think.

                            1. 1

                              Right, WASM needs to be compact on the wire and quickly parsed. It is mostly generated by compilers, but occasionally you want to write it by hand, and read it. The economical solution is to project it onto text.

                              Sure, you can write a structured editor for WASM as the OP was suggesting, but does anyone really think that is a good solution ??? Do you also want to write a WASM version control system that knows how to diff and merge WASM functions perfectly?

                              It’s definitely possible, but I want to see someone pursue those projects with a straight face.

                              When you convert to text, you can a bunch of tools for “free”. You can use sed, grep, git diff, pull, merge, etc. on it. You are moving onto the thin waist; WASM binary is off of the thin waist.


                              Glad you got something out of this comment!

                              I often find myself in these “systems vs. code” arguments. Most people are trying to optimize the code (for local convenience) whereas I’m trying to optimize the system (for global properties). The whole point of Oil is to optimize systems, not code.

                              Ken Thompson unrdestands this, he wrote about the lack of records in his “sermonette” on Unix design in the very first paper on Unix shell: https://lobste.rs/s/asr9ud/unix_command_language_ken_thompson_1976#c_1phbzz

                              (UTF-8 is also a Thompson design that lacks metadata unlike UTF-16, and that’s a feature.)

                              Systems vs. code is a pretty big mindset difference, and it often leads to opposite conclusions. A lot of people here claim they don’t understand Rich Hickey

                              https://news.ycombinator.com/item?id=23905198 (my reply as chubot)

                              which linked:

                              https://lobste.rs/s/zdvg9y/maybe_not_rich_hickey

                              The point here is that dynamic typing has advantages in systems (similar to unstructured text). (And I just got a raised eyebrow from a CS professor when I made this claim, so I know it’s not obvious.)

                              Field presence is checked at runtime, not compile time, because it enables graceful upgrade. That is a systems thing and not a code thing.

                              Static typing is also valid, but should be layered on top. Just like binary is valid, but should gracefully co-exist with tools for text. Haskell, OCaml and Rust are great, but they run on top of untyped Unix, which is working as intended. I consider them all domain-specific type systems. (This is a real argument: Unikernels are optimizing locally, not globally, which is a fundamentally limited design.)

                              I have some blog posts queued up on this, referencing Ken Thompson vs. Kubernetes, and system design, although it’s not clear when they will appear …

                        2. 14

                          Quoting from the article:

                          Argument: You can type out messages

                          Surely, it’s not like you will open a terminal on your cell-phone and then type in SIP like this to call your friend.

                          INVITE sip:1001@10.0.0.1:2780;transport=udp SIP/2.0

                          SNIP

                          While reading this, I had the distinct impression that it might be written by an application developer rather than a systems programmer, system administrator, or network engineer. I speak HTTP/1.1 pretty well. For that matter, I speak SMTP pretty well, too. If you speak SMTP, you can speak similar IETF protocols like NNTP and POP3 without much extra effort. They have a family resemblance, like Spanish and Portuguese.

                          That ability has helped me quite a few times as an admin, but it also proved invaluable while implementing application-layer protocols.

                          The article is thought-provoking. I remain on the fence about which is better: text or binary. Maybe the answer is both, and maybe I’d have a deeper appreciation of binary protocols if I had the right tooling.

                          1. 5

                            I’d have a deeper appreciation of binary protocols if I had the right tooling.

                            Yeah, I kinda like text protocols sometimes too (though they are a real pain to correctly parse. Like there’s usually special rules for edge cases that are easy to forget.), but consider the case of assembly language.

                            I can read a little bit of machine code off a hex dump…. but not much since there’s basically never any need since I can assemble/disassemble just about anything so easily.

                            A binary protocol that came with such a translation thing to/from some text format would probably be the best of both worlds. And then it can be validated before sending to help catch typos too.

                            1. 7

                              Binary protocols have to be parsed, too: Assuming you can just bamf them into a struct is assuming that no transmission errors occurred, no errors occurred in the sending process, and no errors occurred in the machine the sending process was running on. Even if you strictly discount such things as endianness differences and differences in word sizes, there’s still parsing to be done.

                              1. 4

                                Binary protocols have to be parsed, too

                                Indeed, my advice for people is often to do it byte by byte, then no more endianness problem, word size problem, etc. And things like variable length members easier to handle then too. But it still tends to be much easier to do correctly than text format parsing since there’s usually fewer weird cases to consider.

                                1. 2

                                  Agreed assuming that one is working at a low enough layer that they actually need to consider this stuff. I’ve spent a good amount of my career working on avionics hardware & software, and a significant amount of that time was spent staring at a digital scope hooked up to a data bus trying to figure out what was going wrong before a payload even got to the point where I could try to slam it into a (properly packed and versioned) struct.

                                  However, there definitely are streamlined ways to perform quick error checking on binary payloads while they are “on the wire”. For example, parity bits and other checksum techniques. And, most of the time, this error checking is being performed at a much lower layer than the developer is actually working in.

                                  It is sometimes the case that nuanced data that would otherwise need to be parsed using costly string functions can be communicated more efficiently in a binary protocol. Bitwise (flag) lookup tables come to mind – for example, one can communicate 32 different flags in a single integer – and each of which can be checked extremely efficiently using bitwise operators.

                                  Also, from the experience of creating a couple of real-time systems (for enterprise, avionics, and multiplayer video games), binary protocols can often be a lot “nicer” to parse. If you can trust that the payload is not corrupt, then you can simply stream/scan and count – or, even better, blit and mask – rather than perform weird splits and finds while fighting off-by-one errors.

                                  1. 2

                                    Assuming you can just bamf them into a struct is assuming that no transmission errors occurred, no errors occurred in the sending process, and no errors occurred in the machine the sending process was running on.

                                    Adding a checksum to a binary protocol is trivial compared to adding it to a text-based one.

                                  2. 4

                                    This is one of the reasons why I try to use standardized self-describing binary protocols like CBOR or messagepack when feasible. The more they get used the more incentive there is for people to make tools like jq that work with them.

                                    …actually, a hex-editor type program that could display and manipulate CBOR structures would be pretty neat– no no I have enough side projects brain please stop.

                                  3. 3

                                    I write web apps for a living, and I know how to do HTTP “by hand” and have used it for debugging. Same with SMTP. I also encourage people who want to learn backend web development to try out writing a very basic HTTP/1.1 server in their language of choice and communicating with it, to get a feel for what needs to go on.

                                    1. 1

                                      But for how long, now that HTTP/2 and HTTP/3 are a thing [1]?

                                      [1] Thanks be to our new Internet Overlord, Google.

                                      1. 1

                                        A great tool for this is netcat (a.ka. nc). Even better is that it comes as a standard CLI utility on macOS and many Linux distros.

                                    2. 5

                                      I’ve just implemented RTP/RTCP and SRTP (webrtc stack), and let me tell you, there’s nothing CBOR easy about it.

                                      Regardless of whether it’s binary or text based, we generally are stuck with a bunch of protocols invented in the 90s. SIP (and by proxy SDP) is text based and oh dear me it’s terrible. The only redeeming feature is that it’s text based and not some binary Tag-Length-Value.

                                      For 99.9% of programmers out there, the number of times we get to invent protocols can probably be counted on one hand. And given that internet standards are like they are, I don’t think there’s much point in arguing the relative merits of self describing binary formats.

                                      1. 3

                                        How do you debug a binary blob when one of your fields is corrupted so that your tag length is off for a single object?

                                        1. [Comment removed by author]

                                          1. 2

                                            This is not very helpful. What principles of unix do you think are violated by OP’s arguments and why is that a problem?

                                            1. [Comment removed by author]

                                              1. 3

                                                The only information your comment communicates is that ‘some random person on the internet believes the article is no good’. After which every reader shrugs and thinks: “yeah, whatever, there’s always haters; their argument is weak”. You might as well not have commented.

                                          2. 1

                                            MessagePack seems relevant to this conversation (see existing post on Lobsters). It claims some pretty impressive performance and payload size improvements. However, the cases where those would matter nowadays are probably few-and-far between.

                                            1. 1

                                              What is a text protocol? Does the poster mean line based protocols (like the example in the article), or are they referring to any protocol that is not TLV encoded (but not necessarily line based, like PDF or JSON)?

                                              1. 1

                                                Yeah, I also got confused by the wording. I think the article is talking about a ‘serialization format’, not a protocol per se.

                                                Protocol is usually a set of commands + data serialization format per each command + a state machine with well defined transition events

                                                I think the article is talking about just data serialization part of the protocol.

                                                – Another confusion for me, may be this is more from the comments – is document storage formats (eg PDF, RTF, etc) and, if they are related to ‘protocols’.

                                                Usually a document storage format consists of: schema (or metaschema) of object types, serialization format for each of the allowed object types, reference object types (that represent a form of foreign keys or references within a document). But document storage formats usually (at least in my experience), do not have ‘protocols’