1. 61
    1. 24

      “If you’re already using protobufs” in the last sentence is a key caveat… You probably don’t want to take on the dependency for just a config file.

      Also, a slight gotcha is that renaming fields is considered a compatible change in protobuf, because names aren’t sent over the wire. The normal format is binary.

      If you rename a field in the config schema, it will break reading the ASCII data (probably with a bad error). So you can put something like “// Don’t rename fields” in that schema, I guess

      1. 8

        You probably don’t want to take on the dependency for just a config file.

        From where I come (C++), any configuration file format (JSON, YAML, TOML) is an extra dependency. I think what’s more important is that it’s a source code generator and that requires support in your project’s build system.

        In fact, if you go the source code generator route, then it may be possible to get rid of the (runtime) dependency if the generated code can function in the standalone mode (i.e., without requiring a runtime library). We do something like this with cli which is a command line interface compiler that is also capable of reading arguments from a file. It doesn’t have a runtime library, so we just get a pair of generated source files and that’s it.

        1. 4

          Also, a slight gotcha is that renaming fields is considered a compatible change in protobuf, because names aren’t sent over the wire.

          This has always felt to me like a long deprecated consideration born from optimism. Protobufs have canonical JSON and text representations that get used way too often.

        2. 9

          The text format is nice, and having a schema is great.

          The downside, of course, is that you’re using protobuf. Protobuf is amazing – it shows how much code you can squeeze into a relatively small amount of functionality. And it churns surprisingly quickly. Especially when you consider how little the wire format changes.

          1. 1

            Not necessarily defending protobuf here, but what other formats are not dependent on field names? That’s one of its biggest pluses imo

            1. 1

              I think you replied to a different comment than you meant to. Probably https://lobste.rs/s/f37hri/ascii_protocol_buffers_as_config_files#c_y7jsjt this one?

            2. 1

              To be fair, it the wire format changing slower than the code churns is a good thing for a library that implements (de)serialisation. ;)

            3. 6

              I use protobufs for config files frequently. It’s nice to have built-in schema validation, and language support is widespread.

              One trick not mentioned in the post is that you can have the config parser accept other protobuf encodings (binary, JSON) without too much trouble by inspecting the filename: if .pb then binary protobuf, .pb_text then text protobuf, if .json then JSON in the protobuf canonical layout.

              1. This makes it easier to go from a prototype to production, since you probably want the compatibility properties of binary protobufs in a “real” deployment where the executable and config might drift during version updates.
              2. Ability to parse JSON simplifies integration with ad-hoc test harnesses, for example you can have a zero-dependency Python script generate config files with json.dump().
              1. 4

                Envoy defines its configuration structures using protocol buffers. It supports loading configuration data via network interfaces or from static configuration files.

                1. 3

                  Can someone educate me more about the comment in the article “use optional and explicitly check for field existence in the code”? I always thought required fields enforce stronger validation.

                  1. 6

                    More reading than you probably want :)

                    A recent thread that starts with the snippet hwayne linked - https://news.ycombinator.com/item?id=36909894

                    Debate between the author of protobuf v2 and capnproto (kentonv) and another Googler who likes required: https://news.ycombinator.com/item?id=36910030

                    I agree with the advice to avoid required, and frame it more abstractly - https://news.ycombinator.com/item?id=36911033

                    Linking to Rich Hickey’s Maybe Not - https://lobste.rs/s/zdvg9y/maybe_not_rich_hickey

                    To summarize a bit:

                    • You can use required if you want, for simple client/server services. But when your systems evolve into bigger systems, with non-trivial topologies like a search engine (not just client/server), you’ll see the downsides.

                    • It’s natural to think of protobufs as “structs”, because they generate struct-like classes. BUT if you look more closely at the API, they’re more like Clojure-like maps. It’s a big bag of key-value pairs – the names are static, but field presence is dynamic. (especially in proto3 – I think you can’t even use required – it’s outlawed).

                      • That is, you use dynamic message.has_field() checks to be robust in the face of upgrades and downgrades, not static properties.

                    I think the disconnect is that the first application of protobufs was the Google search engine – it was written by the people who wrote the indexing pipeline, the posting list server, etc.

                    Most people aren’t necessarily using it like that, but that’s where it excels.

                    The way protobufs are used in a search engine is actually more like Clojure-like maps than the Maybe<T> style of say Haskell/Rust (again the Hickey talk sheds light on this style of robust / immutable / distributed programming)

                    This is inherent in distributed systems – it has to do with the lack of atomic upgrade.

                    I also frame it as interior vs. exterior – static types are interior to a process, while data in distributed systems is exterior. Static types are global properties, but such global properites are LIES when you have a long-lived system with independent deployment of parts.


                    Even though I mentioned search engines, the distributed upgrade problem occurs in almost all web apps. I think most web app deployments are incorrect – think about what happens when someone has an old cached version of your JS bundle, but they’re running against a new Rails back end.

                    Or vice versa. Are your JSON message BOTH forward and backward compatible? Even if you update your static files and your Rails back end at the “exact same time”, that doesn’t guarantee users won’t receive inconsistent versions.

                    Some web apps actually handle that case, but I think it’s rare and actually quite hard. I think mostly people just rely on the user to hit F5 or something.

                    I have occasionally seen explicit version checks – there will be a notification to hit F5, instead of silent errors.

                    But the much harder option is for the system to keep working seamlessly with inconsistent versions of JS and backend, in both directions (think about a rollback)

                    Big distributed systems require this harder kind of correctness.


                    tl;dr — arguably the main value of protobufs is that they solve the distributed upgrade problem, and required fields break that

                    The caveat is that you can use them “wrong”, and it still kinda works, especially in the short term, and at small scales. You will have bugs, but maybe not more bugs than the rest of your software has …

                    1. 2

                      Maybe take a look at typical, it solves the required problem nicely

                      1. 2

                        It was mentioned in the same thread

                        https://news.ycombinator.com/item?id=36911151

                        I’m not sure the “assymetric” mechanism actually works to be honest – I would want to see someone else use it in production

                        Yeah the reply rightly points out that there is distinction between messages from servers and persisted data. Old versions of servers go away, but you can’t be sure that old versions of data aren’t saved on disk anywhere. Protobufs are used for both.


                        Though it doesn’t really matter to me, since I’m not writing that kind of code anymore

                        As mentioned, you can use required if you want – it just has downsides that only become apparent later

                        1. 1

                          Also, I don’t think it really works for rollbacks … but yeah someone else should try it :)

                          The problem I see is that assymetric is still a static enforcement, while server versions are inherently dynamic.

                        2. 1

                          ooooh, didn’t know about typical, that looks quite interesting.

                      2. 4

                        No personal experience, but this seems to be a good overview of the problems: https://capnproto.org/faq.html#how-do-i-make-a-field-required-like-in-protocol-buffers

                        1. 3

                          This seems like a case where something very instructive happened, and people learned exactly the wrong lesson from it.

                          Making a required thing optional is an incompatible change whether that’s reflected in your protocol definitions or not, and incidents like the ones being described here would be a great opportunity to introduce some sort of automated system that can tell you how to make such changes without breaking production.

                        2. 4

                          The problem is that the future is uncertain. Just because a field is required in every conceivable circumstance today does not mean that a new use-case or constraint won’t appear tomorrow. And when the problem-space changes, it’s easier to adjust the system (the complete set of intercommunicating processes across space and time) if each individual piece is robust against changes in its environment — it gives you a chance to update each piece one at a time, instead of having to update an entire subsystem (or worse, the entire system) in one go.

                        3. 2

                          A similar trick is used at Facebook, where their Configerator system distributes config files as Thrift serialized to JSON. The big difference I see is that Facebook engineers typically didn’t write Thrift directly, but instead a restricted subset of Python which generated the config in question.

                          Rachel worked at both companies (Google, then Facebook, iirc), so she would be very familiar with both variants of the system. And likely has strong opinions of the pitfalls thereof. ;-)

                          1. 1

                            Huh, to my surprise I don’t hate this. Kinda wish they had a more detailed example though.