1. 22
  1.  

  2. 10

    You guys are fast :) I published an announcement an hour ago which gives more context and design insights:

    https://drewdevault.com/2020/06/21/BARE-message-encoding.html

    1. 9

      Before you commit to reinventing Capn Proto, including copying the schema language, I want to let you know that the Capn Proto build tools do not mandate using a C++ API, nor using the libkj event loop. The process for writing new plugins is documented, but since you’re using existing languages, you could use capnpy for Python, capnproto2 for Go, or capnp for Rust, none of which use the C++ API. Instead, Capn Proto’s compiler is designed to come apart into subprocesses naturally, with each subprocess using Capn Proto itself to request schema compilation. The bootstrapping phase takes effort, but it is possible and people have already done a lot of the hard work already for many popular languages.

      I also would wonder whether you’re prepared to take on the security burden. Capn Proto’s upstream maintainers set an extremely high bar, and their vulnerability writeups have garnered serious praise for their quality and depth.

      Finally, although it is not mandatory, Capn Proto defines a custom compression algorithm, “packing”, intended specifically to address your concern that fixed-width and aligned data types are wasteful. The authors also recommend LZ4 or zlib for when packing isn’t sufficient.

      In summary, I think that your:

      $ go run git.sr.ht/~sircmpwn/go-bare/cmd/gen -p models schema.bare models/gen.go
      

      Could be:

      $ capnp compile -I$GOPATH/src/zombiezen.com/go/capnproto2/std -ogo schema.capnp > models/gen.go
      

      With the package information included in the Capn Proto schema file.

      1. 6

        Like I said in the article, I did evaluate Cap’n Proto, and concluded it was not the right fit.

        Before you commit to reinventing Capn Proto, including copying the schema language

        The schema language is not copied from Cap’n Proto, and frankly I don’t understand how you would make this assumption. The two schema languages are very different. Unless you mean the idea of having a schema language at all, in which case you are clearly ignorant of the alternatives and their history.

        I want to let you know that the Capn Proto build tools do not mandate using a C++ API

        C++ is, in fact, required:

        $ capnp compile -I$GOPATH/src/zombiezen.com/go/capnproto2/std -ogo schema.capnp > models/gen.go here -^

        I don’t evaluate technology in a vacuum. The total complexity of the implementation, including code I didn’t have to write myself, is part of my evaluation. There is no attempt at a specification or any motions to support third-party implementations from Cap’n Proto.

        The bootstrapping phase takes effort

        It seems that the typical amount of effort required to make a complete BARE implementation is 3-5 days. I also looked at the C++ frontend for Cap’n Proto: 3,153 lines of support code for C++. The entire Go implemenation of BARE is 3,629 lines of code, including marshaling & unmarshaling, parsing, code generation, exhaustive tests, example code, and generated code, and it has no external dependencies (well, getopt for the generator CLI, and a assertion framework for the tests).

        I also would wonder whether you’re prepared to take on the security burden. Capn Proto’s upstream maintainers set an extremely high bar, and their vulnerability writeups have garnered serious praise for their quality and depth.

        I find it ironic that the security vulnerability you use as an example is the result of C++ programmers doing C++ programmer things, namely over-use of templates (any use of templates being an over-use), and works in favor of my argument that Cap’n Proto’s programming culture is symptomatic of the broken culture that values tools like C++. An ounce of prevention is worth a pound of cure, and in BARE’s case both the spec and implementation are 10x simpler - and simpler always means more secure.

        Finally, although it is not mandatory, Capn Proto defines a custom compression algorithm

        This also argues against you. The message encoding should have nothing to do with compression, this just increases the complexity of both the specification and implementation (by the way, I keep refering to “the specification”, but in reality such a thing doesn’t exist - this fact also works against Cap’n Proto’s favor). The specification cannot be “complete”, either - the compression techniques will become obsolete over time, and the specification will have to be revised to accomodate.

        In general, BARE messages tend to have the same entropy as the underlying information they represent, because the BARE encoding is barely more than the literal data directly copied to the message. Unlike Cap’n Proto, which has alignment and fixed message size wasting space (this is done in exchange for performance, unless you’re decompressing the messages, in which case the performance improvement is totally lost), BARE messages stand to save very little from encoding-aware compression. For this and other reasons besides, compression is out of scope for BARE. Instead, you should feed BARE messages to your favorite compression algorithm yourself, such as zstd, and you will get results comparible to compressing the underlying information.

        Not having compression in-scope simplifies the implementation and specification. Reckless lack of consideration for scope is a major problem which ruled out many of the alternatives I explored before working on BARE.

        1. 2

          Your evaluation can be quoted in full, as it is barely a paragraph, with only half of its fragments qualifying as sentences:

          Cap’n Proto: fixed width, alignment, and so on — good for performance, bad for message size. Too complex. RPC support is also undesirable for this use-case. I also passionately hate C++ and I cannot in good faith consider something which makes it their primary target.

          BARE is not simpler; feel free to define a metric before revisiting that point. RPC support is optional. We’ve covered everything else.

          Some of your statements are either wrong or questionable:

          The two schema languages are very different.

          Really? All of these declarative schema languages look very similar to me. Which one were you trying to reinvent?

          There is no attempt at a specification or any motions to support third-party implementations from Cap’n Proto.

          Baldly contradicted, and I will continue to link to evidence as needed.

          in BARE’s case both the spec and implementation are 10x simpler

          This needs a metric.

          I, too, passionately hate C++. And yet, somehow, nobody really cares about my feelings. Capn Proto does not make C++ the primary target, nor does it mandate the C++ API. You are pointing at a subprocess invoked via shell, which is a common linguafranca across many different languages. There is no C++ linkage, no C++ name mangling, and no C++ templates. The reference implementation makes only one request, and it is a polite one: For interoperability, use the same struct offsets every time, and preferably use exactly the same algorithm as the original.

          Capn Proto has a complete encoding specification. Only the struct offsets are computed, but they are not necessary for deserializing a message, only for interpreting it according to certain names and schemata. Indeed, when I bootstrapped Capn Proto support for Monte, I started by writing a deserializer for unnamed buffers and messages, and only later added in support for schemata.

          On a more personal note, attacks and rudeness like:

          frankly I don’t understand how you would make this assumption

          in which case you are clearly ignorant of the alternatives and their history

          Are how you maintain your reputation for toxicity. I don’t believe in moderating people unkind, but I do think that you could stand to look at your words and how you escalated this conversation.

          1. 8

            Your evaluation can be quoted in full, as it is barely a paragraph, with only half of its fragments qualifying as sentences

            Yes, because it’s sharing a list of bullet points with 4 other alternatives in an article whose purpose isn’t to provide a detailed critique of Cap’n Proto.

            BARE is not simpler; feel free to define a metric before revisiting that point. RPC support is optional.

            Everything is optional, I could just choose not to use a subset of either system’s primitives and call it “less complicated”. Your argument doesn’t make any sense. A complete, conformant implementation of BARE is simpler than one of Cap’n Proto. Here’s one metric: the encoding spec you linked to is 3,908 words. The BARE spec is 1,430, and the BARE spec also includes a grammar for the schema DSL.

            Really? All of these declarative schema languages look very similar to me. Which one were you trying to reinvent?

            Give me a break. What they have in common appears to be the enum keyword, convention of PascalCase for user-defined types, the use of “:” to separate field names from types, and the use of braces “{ }” for structs. There are dozens of other languages that we share all of these traits with. In every other respect they are quite different:

            • BARE allows you to make any kind of user-defined type, not just structs
            • Cap’n Proto has explicit field offsets, BARE lacks them
            • Cap’n Proto has semicolons, BARE lacks them
            • Cap’n Proto has semicolons, BARE lacks them
            • BARE has a different syntax for arrays (lists in cap’n terms)
            • BARE has fixed arrays, maps, and first-class tagged union types, and syntax for each
            • Cap’n Proto has default values, and BARE’s DSL can’t describe values at all
            • Cap’n Proto has the concept of groups, which BARE lacks
            • BARE has no RPC and therefore no syntax for it

            I find your acusation of copying the schema completely unfounded and extremely rude.

            There is no attempt at a specification or any motions to support third-party implementations from Cap’n Proto.

            Baldly contradicted, and I will continue to link to evidence as needed.

            I will correct myself: there is a specification of the encoding. There is no specification of the DSL. There is documentation, but documentation is not a specification.

            They support third-party integrations, not implementations. This is an important difference.

            in BARE’s case both the spec and implementation are 10x simpler

            This needs a metric.

            See my first comment for the implementation size and this comment for the spec size.

            You are pointing at a subprocess invoked via shell

            A subprocess written in C++. This puts C++ in your dependency tree. Just because you don’t link to it with the C++ ABI doesn’t mean it’s not a dependency. There are multiple ways to skin^Wdepend on a cat.

            On a more personal note, attacks and rudeness like:

            frankly I don’t understand how you would make this assumption

            in which case you are clearly ignorant of the alternatives and their history

            Are how you maintain your reputation for toxicity. I don’t believe in moderating people unkind, but I do think that you could stand to look at your words and how you escalated this conversation.

            The use of the word “frankly” and using ignorance to explain unfounded arguments is not rude. Telling me that I am maintaining a reputation for toxicity, however, is extremely rude. You are disparaging my work and my character with baseless arguments and false comparisons.

            We get it, I don’t like your sacred cow. It is entirely possible for me to have evaluated it and determined it did not meet my needs, and also valid for me to determine that I simply don’t like it, and also valid to conclude that the idea is poorly realized by its implementation. In this case I came to all of these conclusions. The fact of the matter is that your protocol is not perfectly suited to all use-cases and neither is your implementation, and you aren’t entitled to praise for it.

            1. 3

              You’re kind of being rude here. Also, if you know that the metric hasn’t been defined then how did you decide that BARE is not more simple?

              1. 7

                They both are, in my opinion. I’d really prefer not see conversations in this tone on lobste.rs (speaking for myself only, of course).

                1. 4

                  Dunno. I think @ddevault is just defending their post… Doesn’t seem particularly rude to me

                2. 2

                  I’m trying. I am not a nice person and it’s been nearly a decade of trying to have conversations with the author. But sure, I do not mind receiving the unkind vote. This isn’t the first time that a basic technical discussion has been horribly derailed by their attitude, and not the first time that I’ve been told that I’m part of the problem because I refuse to go along with their framing.

                  I did not claim that Capn Proto is simpler than BARE. I instead rejected the idea that there is a natural and obvious metric by which to claim that one is simpler than the other. This is a frustrating ambiguity in English; the word “not” can mean three different things and there is no way to disambiguate other than to write an explanatory paragraph.

                  No suitable metric is given. One metric, lines of code, is used; however, lines of code between two different host and target languages are apples-to-oranges. To make it apples-to-apples, the comparison would have to be either between BARE and Capn Proto complete stacks in a single language (presumably C++), or between two Capn Proto implementations in two different languages. Either way, the appropriate comparison hasn’t been drawn.

            2. 1

              I think the appeal of BARE is that it has a low threshold of entry (like json, msgpack, etc.) for anyone willing to write a code-generator for it. On the one hand, capnproto’s schema language seems cleaner and more expressive; the problem is that the whole encoding format, albeit efficient, is very complex in the name of allowing zero-copy access. There should be room for a language like BARE, more or less in the same niche protobuf occupies currently (fast and binary but not insanely complicated).

              Sadly it looks like BARE will not get sum types/general tagged unions, and cannot represent them (except for option). I got fed up in this thread and failed to keep as civil a tone as I wished I had, but anti-intellectualism irritates me.

              1. -4

                BARE has sum types aka tagged unions. What it does not have is zero-bit types, which often change the way you use tagged unions. After an hour of back and forth it became apparent that c-cube lacks the “intellectual” capacity to understand this difference.

                1. 6

                  And, this is the main reason I actively avoid working on any of the projects he’s involved in. “Feedback is welcome”, the page says, apparently until it gets annoying or the author has to deal with someone stubborn who has a different way of looking at the world, at which point you get called a moron in public.

                  1. 4

                    The back and forth made me sad and annoyed too. I think the terminology is confusing and ambiguous, because we have different ideas on what “sum types” precisely entails and could not find the words to carry the difference over short messages.

                    The “anti-intellectualism” was also not civil, but so are the straight insults towards functional programming and people who like it.

                    1. 0

                      Your “different” idea on what “sum types” precisely entails is not just different, but factually incorrect. You approached the discussion with a flawed understanding of the type system you espouse and made incorrect assumptions about the limitations of BARE’s type system. This is the main source of frustration. I appreciate that you were willing to evaluate the specification and share your feedback, but you do not fully understand the problem that you are trying to communicate about.

                      Regardless, after more thought to the potential use-cases and the best implementation approach, I have added a void type to the specification, which should enable the types you want to construct with BARE messages.

              2. 3

                The choice on schema upgrade is interesting (or strange). That means when upgrading to MessageV2, you need to write a manual translation between MessageV1 and MessageV2?

                What about incremental updates (schema evolution, i.e. adding and remove fields in a struct)? Protobuf and Flatbuffers both provides an evoution path for these while it seems BARE and Cap’n’Proto elected (not exactly sure about BARE story other than your compatibility section, but definitely Cap’n’Proto) to not support this.

                1. 1

                  What about incremental updates (schema evolution, i.e. adding and remove fields in a struct)?

                  The length is not encoded into a BARE message, so there’s no way to determine when one message stops and the next begins without knowing the schema of the message in advance. If you append fields, you won’t be able to tell that they’re not part of the next message.

                  But, nothing is stopping you from adding your own length to messages, either through context (Content-Length) or explicitly adding a length prefix before reading out the struct (the Go implementation provides tools you can use to read primitives yourself for things like this, and I would hope that other languages would do so). So if you prefer that approach, then by all means. Another alternate approach could express a version out of band, like X-Message-Version.

                  The tagged union versioning approach works in all contexts in which BARE messages work, so it’s recommended as a universal solution. But if you prefer a different option, you’re welcome to establish the necessary context to support it out-of-band and go with that.

                  1. 1

                    Thanks. Yeah, read the doc a bit more and realized that the struct just a concatenation of underlying fields. Unlike Protobuf or Flatbuffers, it doesn’t carry any offsets, therefore, adding / removing fields requires either having a brand-new type in the union types, or some other means (depending on the language) to carry the old schema somewhere (out-of-band).

                    It is interesting to observe how these are solved. I am more comfortable with the built-in approach by Protobuf / Flatbuffers but I think different solutions fit different usage cases and having additional redirection may be undesirable for many cases.

              3. 4

                So, to close the loop, should we expect a RPC system (wire format + interface declaration format) on top of that BARE encoding? :^)

                1. 3

                  Why BARE instead of competitors?

                  1. 1

                    So what does it offer that “pick one” doesn’t offer?

                    In particular, given the ietf Core-WG impetus behind Cbor and Yang, why didn’t you just go with that?

                    1. 3

                      I explained the CBOR comparison briefly in the article here:

                      CBOR: ruled out for similar reasons: too much extensibility, and the schema is encoded into the message. Has the advantage of a specification, but the disadvantage of that spec being 54 pages long.

                      It’s too complex, which leads to incomplete and poor quality implementations, and it has too much extensibility, leading to perpetual maintenance and churn. And because it encodes the schema into the message, it ends up with longer messages, and the priority of BARE is small message lengths.

                      I explain the rationale for not selecting a handful of alternatives in this article, it should give you enough of an idea of my thought process to explain the rationale for not selecting others as well.

                      1. 4

                        it encodes the schema into the message.

                        Hmm, not quite true, like self-describing maybe, like JSON you can take a blob of CBOR, and convert it to something you can eye ball it, but a schema is a slightly different beast.

                        One of the things that I feel is quite compelling about where Core-WG is going, is you can optionally encode a sid into your msg, which will allow you to look up the Yang schema in the appropriate registry. ie. Not only self-describing ala JSON, but self-documenting.

                        ps: You xkcd reference is slightly out… you have taken 16 competing standards to 17!

                        Ah well, as they say in the classics… “Standards are such a Good thing. Everybody wants their own one!”

                        1. 1

                          Still - it’s encoding extra data into the message to describe its schema. BARE aims to make the message as small as reasonably possible while still being 8-bit aligned. That, combined with the other reasons I explained, rules out CBOR.

                          1. 2

                            My only comment is….

                            When a disease has many treatments….

                            It has no cure.

                            1. 1

                              You could make an IDL for code-generation that relies on CBOR or msgpack as the concrete encoding. A msgpack array of values is probably as compact an encoding for a struct as BARE’s.

                              Msgpack is arguably very simple; even Redis embeds it, and Antirez is not known for making complicated architectural choices. It just lacks a well supported IDL to help (de)serialize.

                      2. 1

                        Can you produce, say, an IPv6 packet as a fully valid BARE message?

                        1. 1

                          The idea is if the specification is generic enough to fit ti, then it might be able to produce a lot of tooling for existing formats. Like actual TLV packets, 9P or SFTP protocols, TLS packets, BER encoded X509 certificates…

                          Then out of the existing situation, adding another standard that describes the others would not produce a 14 std + 1 std == 15 std, but 14 std + 1 std => 8 std (1 that describe them all + 7 spare ones that did not fit the generic one added).

                          1. 2

                            Kaitai Struct is a DSL for describing arbitrary binary structures.
                            Here is the specification for an IPv6 packet.

                            1. 2

                              It’s not a DSL, it’s YAML. Dear god, please write a DSL for this, specifying arbitrary binary data types in YAML is a fate I wouldn’t wish on my worst enemies.

                              1. 1

                                The generated code looks quite readable in comparison…

                                    this.srcPort = _io.readU2be();
                                    this.dstPort = _io.readU2be();
                                    this.seqNum = _io.readU4be();
                                    this.ackNum = _io.readU4be();
                                
                              2. 2

                                Name an idea, and surprise!, someone did it already. :)

                              3. 1

                                No, this use-case is not the intended usage of BARE. But it would be nice to have something which could universally represent any data structure, though such a task would be large indeed. I imagine that, in practice, I could come up with data structures which were unrepresentable at least as fast as you were able to come up with representations of them.

                                1. 1

                                  I could come up with data structures which were unrepresentable

                                  I trust you for that! Some “throw everything from your bowels at me, and I will parse it” sounds like an over-complex metaformat. It feels better to aim something like 20% of the most common.

                            2. 1

                              Any specific reason for disallowing empty structs? They would allow for tagged unions to mix elements with and without a “payload”. The optional<T> type would be equivalent to (struct{} | T), for example.

                              1. 1

                                I just finished adding this in the form of a void type. I was hoping to avoid it because it adds a layer of meaning which mostly exists in the type system, and not in the encoded value. I’m still not entirely happy with the constraints I had to impose on its usage and the complexity it adds to the specification.

                                1. 1

                                  Cool. Yeah, void types are a little odd and really only make sense in unions afaict. Speaking of which, I noticed that the spec describes unions as a “set” of types. Does this mean that a type can only be used once in a union? If so, are type A void and type B void the “same” in that context?

                                  1. 1

                                    No, the spec also clarifies that user defined types make a new type which is distinct from its source type.