A solution to this problem - more specifically, a serialization mechanism with good support for algebraic (sum) types - is the missing piece for making some really good systems. But I worry that the safety guarantee here is pretty weak:
It is then easy to build all dependent services as downstream jobs during the build phase, which gives engineers early warnings about compatibility issues with their service APIs.
So that sounds like you can only test whether you’ve broken specific downstream clients, and only the current version. Whereas what you really want to know is whether the old, deployed versions of your clients, and perhaps even clients developed by third parties without your knowledge, are going to work with your updated server. The guarantees that Thrift gives you on this point are good enough to justify all the code generation and overhead, at least for many use cases. I’d love a better (bwim mainly more ADT-friendly) serialization format that still offered the same level of guarantees, but it sounds like this isn’t that.
(I’m one of the remotely authors)
Yeah so we are actually attacking things from the opposite angle, which is that when I client connects, it determines if the server is one it wants to talk to. It does this by requesting from the server the set of functions that this server exports, then the client can check this set against the set of functions that were present at the time the client was compiled and make sure that the set of server functions is a superset of the functions the client is expecting. This is an evolving story, it all could change, we are still figuring out what we want to do ourselves. :)
So my use case is a fairly ordinary I guess “SOA”: I work on a system made of a bunch of components, each of which exports a service (or a small set of related services) that will be called by several others. What I like about Thrift is that I can know whether any given change to the IDL is going to be compatible; if it isn’t, I have to make a migration plan (which is a pain because there will be several other components that call this service), but if it’s compatible I don’t even touch the clients. I can just restart that service (clustered behind a load balancer) with the new version, and I know statically that the clients will continue to work.
It seems like by the time the client figured out things were wrong it would already be too late; I need to know at the time I upgrade the server. Maybe that’s not a use case you’re interested in supporting, but it feels like a common one (I’ve worked on similar systems at several companies)