1. 36
  1.  

    1. 9

      I wish someone would do a history of proto’s design decisions in public. In particular what lead to proto3, why some of the core principles were partially walked back on (eg: optional was added) and how the 2023 edition almost ended up with proto2 again. I can only guess what lead there, but it’s not clearly communicated. It makes it very hard to understand what is going on.

      Using protobuf with Rust is a pretty frustrating experience since in proto3 there is a disconnect between primitives and third party messages when it comes to optional flags which is really a result of the specification itself and how it makes to languages that do not have nullability built in.

      Here is are some issues related to optional in Rust that just don’t really exist in languages with nullability everywhere:

      1. 4

        AFAIK, the long and short of it for proto3 and all fields being optional was because some bad outages were triggered at Google. Say you have two servers communicating using protos. You add a required field. Server 1 rolls out before Server 2. Server 2 sends a message to Server 1, Server 1 says the message is missing the required field and rejects the message. Now Server 2 is totally borked, can’t make progress, and servers start puking and you get an outage. Removing required stops that failure mode and forces the maintainers of Server 1 to think about how to handle the field not being there.

        I have no idea about optionals return or the editions. I didn’t even know optional came back.

        1. 2

          I think the new optional just means you can distinguish whether a field was explicitly set.

          1. 1

            Surely an integration test would have caught this early? Or an early canary (partial roll-out)?

            1. 3

              Rollout co-ordination is hard :shruggie:

              We have an automated canarying system now which should catch these before, but ideally solving it at the technical level would be the right call. The automated canarying system does seem to work very well in my experience. It is able to monitor RPC calls and error rates, and will reject a rollout if error rates abnormally elevate. Such a cascading failure should be hard to trigger now.

          2. 1

            The optional framing confused me a bit – I think there were 2 main issues that are different but related:

            • whether fields have a has_foo() method generated
            • whether it makes sense to have “required” fields – i.e. if the field is not present, then deserialization fails IIRC

            There was a long thread I participated in here about similar issues - https://news.ycombinator.com/item?id=36909894

            You can see there is pretty intense disagreement on these design points, e.g. https://capnproto.org/faq.html#how-do-i-make-a-field-required-like-in-protocol-buffers

            (kentonv implemented proto2, based on proto1, and he also implemented capnproto largely based on the experience of proto2. A different team implemented proto3. FWIW I reviewed the design doc at Google for proto3 Python, since I had voiced some objections with the proto2 API. But I don’t recall ever using proto3 Python, since I had left by the time it was rolled out. I also wrote my own toy protobuf implementation that used Python reflection rather than code gen.)

            I think that is essentially where the confusion comes in – because different people worked on it at different times, and didn’t agree


            I’ll also repeat the link to Rich Hickey’s “Maybe Not” - https://lobste.rs/s/zdvg9y/maybe_not_rich_hickey

            A big issue is that some people think of protobufs as structs, and some people think of them as Closure-like maps. (The wire encoding is literally a map of integers to typed values; the code gen to “extend your type system over the network is perhaps an illusion.)

            It depends on what you’re working on ! Some schemas at Google have 300+ fields in a single record, and they are best thought of as maps.

            Most people start off using them like structs. And then they throw away the code because their project gets cancelled, and they never graduate to the “map” part – where you need to maintain a service over 10 or 20 years.

            That’s how web search works – the schemas often outlive the code.


            So there are different requirements from different groups, which may lead to a “confused” design:

            • people who improve a single component in a system, vs. people who make new systems and work across the boundaries
              • as a system matures, more people work on a single component – they want things to be more “static”
            • systems that are client/server, vs. systems with non-trivial server topologies, like in web search (kentonv has some comments on this)
            • data that’s persisted, vs. data that’s ephemeral between servers (both have upgrade issues)
            • people who use protobufs for language X <-> X communication, vs X -> Y (C++ was/is a very popular X)
          3. 4

            Why are the methods generated as GetFoo() instead of the idiomatic Foo()?

            1. 23

              If we named the accessor method Foo, that would make the Hybrid API impossible: This intermediate API level exposes both, accessor methods and fields, which would both need to be named Foo. The Hybrid API is an important tool when doing a medium to large-scale migration (smaller repositories don’t need to bother with intermediate stages).

              1. 2

                I would imagine to match up with the HasFoo() method.

                Vaguely reminds me of SOAP generated clients from years back.

              2. 1

                I’m confused by the comment about bit fields. Probably because I haven’t used protobuffs in a long time and forgotten concepts. I imagine this has something to do with the comment about “uncouple(ing) the Generated Code API from the underlying in-memory representation”.

                In any case, I’m still confused by the generated code shown after the section mentioning the use of bit fields:

                package logpb
                
                type LogEntry struct {
                  xxx_hidden_BackendServer *string // no longer exported
                  xxx_hidden_RequestSize   uint32  // no longer exported
                  xxx_hidden_IPAddress     *string // no longer exported
                  // …internal fields elided…
                }
                
                func (l *LogEntry) GetBackendServer() string { … }
                func (l *LogEntry) HasBackendServer() bool   { … }
                func (l *LogEntry) SetBackendServer(string)  { … }
                func (l *LogEntry) ClearBackendServer()      { … }
                // …
                

                The only change I see here from previous generated code is the lack of exported fields. The types are standard Go types. Are “bit fields” referring to how presence is modeled in the wire format? I’m sure I’m being dense about something, but after a mention of bit fields for memory savings, I was expecting to see how they’re implemented in generated code, since code was being shown for everything else. Does the section “Opaque structs use less memory” refer only to memory savings in the wire format? It appears evident that those savings are not in generated code, since the above struct is a simple Go struct with elementary Go types and unexported fields.

                1. 1

                  I think the part that made bitfield make sense to me was eluded in the blog post. In the current implementation there’s an XXX_presence [1]uint32 that’s used as a bitfield. The generated Has methods check this bitfield for if the field was set on the received message, and the Set and Clear methods mutate this bitfield. This allows the struct to directly store the field (eg, for request_size field of the message), and use the type’s zero value instead of a nil pointer.

                  1. 1

                    Yes, thank you!

                  2. 1

                    The only change I see here from previous generated code is the lack of exported fields

                    The change you are missing is from RequestSize *uint32 (pointer to uint32) to xxx_hidden_RequestSize uint32 (no longer a pointer). Hope this helps makes sense of that section of the announcement :)