No, this guy is wrong. Protocol buffers are endlessly pragmatic and many of the “bad decisions” he points out have concrete reasons.
For instance - he suggests all of the fields should be required. required fields existed in at least proto2 and I assume proto1, but were discovered to be terrible for forwards compatibility. I agree with his footnote that there’s a debate, but one side of it decisively won. If a field is required in one release of your code, that code can never talk with any protocol buffer serializations from future releases without that field being required without blowing up. The most frequent internal advice I saw was “avoid required. required is forever.” As a result, most feedback encouraged everything to be optional or repeated, which was made official in proto3.
Second, here’s how he wants to implement repeated:
This just reeks of a complete ignorance of a couple of things -
How is this going to look for serialization/deserialization? Sure, we’ve embedded a list into a data structure, but what matters is being fast. Protocol buffers pragmatically describe useful data structures that also are very close to their native wire format. This is not that, but he says
“the actual serialization logic is allowed to do something smarter than pushing linked-lists across the network—after all, implementations and semantics don’t need to align one-to-one.” The protocol buffer implementation must be simple, straightforward, bugfree, and implemented in every language anyone wants to use. Static analysis to detect these patterns could work, but good luck maintaining that logic in every language of your lingua franca language interoperability system.
Third, as an example of the designers of protobufs being amateurs, he says:
It’s impossible to differentiate a field that was missing in a protobuffer from one that was assigned to the default value.
headdesk proto2 definitely supported this functionality. It was stripped out in proto3 after literally decades of experience from thousands of engineers said that on balance, the tradeoff wasn’t worth it. You can’t claim that a hard look of the tradeoffs is a result of being amateurs.
Fourth:
With the one exception of routing software, nothing wants to inspect only some bits of a message and then forward it on unchanged.
This is almost entirely the predominant programming pattern at Google, and in many other places too. Protocol buffers sound… perfectly designed for their use case!
Thanks for this critique, you’re right on. I do agree with one part though - you need to make application specific non-proto data structures that often mirror the protos themselves, which isn’t exactly DRY.
Here’s an example that I’m struggling to find a “nice” solution for. Locally running application has a SQLite database managed via an ORM that it collects structured log entries into. Periodically, it bundles those log entries up into proto, removes them from the local database, and sends them (or an aggregated version of them) up to a collection server.
The data structures are the exact same between the protos and the database, yet I need to define the data structures twice.
Hmm, yeah, that’s a tough one. One thing that the protobuf compiler supports though is extensible plugins (e.g., check out all of the stuff gogoproto adds as extensions to the compiler: https://github.com/gogo/protobuf/blob/master/extensions.md)
Perhaps the right thing in ORM situations at a certain scale (maybe you’re not at this scale yet) is to write a generator that generates the ORM models from the protobuf definitions?
Yeah, that would seem like the right solution in this case. In any case, what I described isn’t even a problem with the proto way of thinking, it’s just a tooling issue.
Nice critique, better than I could have enunciated. I worked with the author at a company that chose protocol buffers and assume in part that the article is directed at those of us (myself included) who chose to use protocol buffers as our wire serialization format. It was the most viable option given the constraints and the problems a prior system was exhibiting. That said, the author has predilections and they show in the article’s tone and focus.
Category-theoretic thinking of products/sums is a good logical model, but I think it’s awful if your physical memory layout is the same thing as your logical model.
For an example, lets take a list [1,2,3]. In product/sum design your representation for this structure is: 1:(2:(3:nil)).
Imagine it costs you 1 byte to store a number, and 2 bytes to store a structure. If you take the literal in-memory interpretation for this structure, it is formed from pairs of references (total cost=10): 01 *> 02 *> 03 *> 00
If you’re dealing with packed representations, you terminate by empty structure, you end up with: 01 02 03 00.
But if you didn’t treat physical memory layout as logical memory layout, you could also give it a different representation where the sequence is annotated with a number first: 03 01 02 03.
I think protobuffer sucks because the schemas are directly compiled into the user language. They would have a much better system if they first converted protobuffer schemas into protobuffer files, then have an interpreter for such files in each dynamic language, and compilers from the schema for the compiled languages.
I also think that the post illustrates the common error that people tend to do, that is to not recognize that implementation details come and go. You really should not let your language be influenced by them, and if you force implementation detail with your language then you open the barn door for that.
I think protobuffer sucks because the schemas are directly compiled into the user language. They would have a much better system if they first converted protobuffer schemas into protobuffer files, then have an interpreter for such files in each dynamic language, and compilers from the schema for the compiled languages.
Just from a pragmatism perspective, that sounds like significantly more work for every language that wants to have a protobuf library. As it stands, having a straightforward translation from the object in memory to the wire format greatly assists implementation across all of the potential languages that need implementing. I think this is the key reason Lua, for example, has seen such broad adoption as a scripting language. It’s easy to embed because it has a very natural layout for interoperability (all calls just push and pop stuff on a stack). It’s very easy to write a Lua FFI.
It’d be a bit more work in each dynamically typed language that you need to support. You’d need a wire format decoder and a script that decodes the schema file and uses it to translate between wire format objects and their legible counterparts in the client language. But that’d be nice to use when you got to read from or write into a protobuffer file because you could just do the pip install protobuf -equivalent of your scripting language and then start rolling:
It’s quite involving to get the .proto3 -compiler to work. It’s almost like compiling a C project in complexity. It produces plain code that reserves its own directory in your project.
I think protobuffer sucks because the schemas are directly compiled into the user language.
IMO, this is an example of a tooling problem being perceived as a problem with protobuf because the prevailing implementations do it that way. If you want an interpreter-style proto library for C, check out nanopb. protoc will produce data and interfaces (struct definitions) instead of a full C implementation.
The article makes several good points, but it appears as though the author has a particular bone to pick with protocol buffers (mostly the map type, it seems) and comes off as more rant-y than constructive.
While only anecdotal, I’ve been working with protobufs for communicating with an embedded device with very restricted memory space, and after having evaluated the usual players (Avro, Capnproto, Thrift), protobufs was the only serialization protocol that was feasible to use. I would have preferred to use capnproto, but the lack of C-based support was a deal breaker.
In our particular use-case, protobufs made all the right trade-offs, even though I do sometimes curse at some imposed constraints.
Briefly, but I have 1) had bad experiences with some AS1 encoders/decoders in the past (some of them are terrible, and make seemingly unnecessary allocations), and 2) we are already pushing the boundaries of what the embedded chipset is capable of in terms of throughput, and serialization/deserialization speed and resulting payload size is legitimately a concern given how much data we are pushing per unit of time.
Also, the documentation for ASN.1 is… lacking, at times. If you’ve ever tried to figure out how to implement tags to support some kind of forwards/backwards compatible type definitions, you know what I mean.
The author mentions that Java has a bad type system because it’s “too stifling without giving you any of the things you actually want in a type-system.”
So what’s wrong with Java’s type system? I’ve worked with multiple languages that have type systems and I don’t really see a difference between Java and the others.
Little type inference (getting better though with var). Nullable by default. No support for defining tagged unions types with pattern matching and exhaustiveness checking. Verbose way of defining record types. Clunky lambda types. I could go into more things, but fixing those things would at least get you closer to Elm’s level.
The count left of the headline is always votes on the original story, but the ranking (“hotness”) of the merged story and its comments are taken into effect by Story#calculated_hotness.
No, this guy is wrong. Protocol buffers are endlessly pragmatic and many of the “bad decisions” he points out have concrete reasons.
For instance - he suggests all of the fields should be required.
required
fields existed in at least proto2 and I assume proto1, but were discovered to be terrible for forwards compatibility. I agree with his footnote that there’s a debate, but one side of it decisively won. If a field is required in one release of your code, that code can never talk with any protocol buffer serializations from future releases without that field being required without blowing up. The most frequent internal advice I saw was “avoid required. required is forever.” As a result, most feedback encouraged everything to be optional or repeated, which was made official in proto3.Second, here’s how he wants to implement repeated:
This just reeks of a complete ignorance of a couple of things -
Third, as an example of the designers of protobufs being amateurs, he says:
headdesk proto2 definitely supported this functionality. It was stripped out in proto3 after literally decades of experience from thousands of engineers said that on balance, the tradeoff wasn’t worth it. You can’t claim that a hard look of the tradeoffs is a result of being amateurs.
Fourth:
This is almost entirely the predominant programming pattern at Google, and in many other places too. Protocol buffers sound… perfectly designed for their use case!
What a frustrating read.
Thanks for this critique, you’re right on. I do agree with one part though - you need to make application specific non-proto data structures that often mirror the protos themselves, which isn’t exactly DRY.
Here’s an example that I’m struggling to find a “nice” solution for. Locally running application has a SQLite database managed via an ORM that it collects structured log entries into. Periodically, it bundles those log entries up into proto, removes them from the local database, and sends them (or an aggregated version of them) up to a collection server.
The data structures are the exact same between the protos and the database, yet I need to define the data structures twice.
Hmm, yeah, that’s a tough one. One thing that the protobuf compiler supports though is extensible plugins (e.g., check out all of the stuff gogoproto adds as extensions to the compiler: https://github.com/gogo/protobuf/blob/master/extensions.md)
Perhaps the right thing in ORM situations at a certain scale (maybe you’re not at this scale yet) is to write a generator that generates the ORM models from the protobuf definitions?
Yeah, that would seem like the right solution in this case. In any case, what I described isn’t even a problem with the proto way of thinking, it’s just a tooling issue.
Nice critique, better than I could have enunciated. I worked with the author at a company that chose protocol buffers and assume in part that the article is directed at those of us (myself included) who chose to use protocol buffers as our wire serialization format. It was the most viable option given the constraints and the problems a prior system was exhibiting. That said, the author has predilections and they show in the article’s tone and focus.
Were you replacing XML?
This is the best critique of this rant I’ve read, and you didn’t even go into the author’s attitude. Kudos and thank you.
Category-theoretic thinking of products/sums is a good logical model, but I think it’s awful if your physical memory layout is the same thing as your logical model.
For an example, lets take a list
[1,2,3]
. In product/sum design your representation for this structure is:1:(2:(3:nil))
.Imagine it costs you 1 byte to store a number, and 2 bytes to store a structure. If you take the literal in-memory interpretation for this structure, it is formed from pairs of references (total cost=10):
01 *> 02 *> 03 *> 00
If you’re dealing with packed representations, you terminate by empty structure, you end up with:
01 02 03 00
. But if you didn’t treat physical memory layout as logical memory layout, you could also give it a different representation where the sequence is annotated with a number first:03 01 02 03
.I think protobuffer sucks because the schemas are directly compiled into the user language. They would have a much better system if they first converted protobuffer schemas into protobuffer files, then have an interpreter for such files in each dynamic language, and compilers from the schema for the compiled languages.
I also think that the post illustrates the common error that people tend to do, that is to not recognize that implementation details come and go. You really should not let your language be influenced by them, and if you force implementation detail with your language then you open the barn door for that.
Just from a pragmatism perspective, that sounds like significantly more work for every language that wants to have a protobuf library. As it stands, having a straightforward translation from the object in memory to the wire format greatly assists implementation across all of the potential languages that need implementing. I think this is the key reason Lua, for example, has seen such broad adoption as a scripting language. It’s easy to embed because it has a very natural layout for interoperability (all calls just push and pop stuff on a stack). It’s very easy to write a Lua FFI.
It’d be a bit more work in each dynamically typed language that you need to support. You’d need a wire format decoder and a script that decodes the schema file and uses it to translate between wire format objects and their legible counterparts in the client language. But that’d be nice to use when you got to read from or write into a protobuffer file because you could just do the
pip install protobuf
-equivalent of your scripting language and then start rolling:It’s quite involving to get the
.proto3
-compiler to work. It’s almost like compiling a C project in complexity. It produces plain code that reserves its own directory in your project.IMO, this is an example of a tooling problem being perceived as a problem with protobuf because the prevailing implementations do it that way. If you want an interpreter-style proto library for C, check out nanopb. protoc will produce data and interfaces (struct definitions) instead of a full C implementation.
The article makes several good points, but it appears as though the author has a particular bone to pick with protocol buffers (mostly the
map
type, it seems) and comes off as more rant-y than constructive.While only anecdotal, I’ve been working with protobufs for communicating with an embedded device with very restricted memory space, and after having evaluated the usual players (Avro, Capnproto, Thrift), protobufs was the only serialization protocol that was feasible to use. I would have preferred to use capnproto, but the lack of C-based support was a deal breaker.
In our particular use-case, protobufs made all the right trade-offs, even though I do sometimes curse at some imposed constraints.
Did you consider ASN1/DER?
Briefly, but I have 1) had bad experiences with some AS1 encoders/decoders in the past (some of them are terrible, and make seemingly unnecessary allocations), and 2) we are already pushing the boundaries of what the embedded chipset is capable of in terms of throughput, and serialization/deserialization speed and resulting payload size is legitimately a concern given how much data we are pushing per unit of time.
Also, the documentation for ASN.1 is… lacking, at times. If you’ve ever tried to figure out how to implement tags to support some kind of forwards/backwards compatible type definitions, you know what I mean.
The author mentions that Java has a bad type system because it’s “too stifling without giving you any of the things you actually want in a type-system.”
So what’s wrong with Java’s type system? I’ve worked with multiple languages that have type systems and I don’t really see a difference between Java and the others.
Little type inference (getting better though with
var
). Nullable by default. No support for defining tagged unions types with pattern matching and exhaustiveness checking. Verbose way of defining record types. Clunky lambda types. I could go into more things, but fixing those things would at least get you closer to Elm’s level.@pushcx , this is a duplicate of https://lobste.rs/s/5wp65s/protobuffers_are_wrong , I missed it when posted this one. Could you please merge it?
(sorry for the duplicate…)
✓
Not sure if it was intentional or if something else happened but the merged story has less upvotes than the unmerged one did.
The count left of the headline is always votes on the original story, but the ranking (“hotness”) of the merged story and its comments are taken into effect by Story#calculated_hotness.
oh neat