This is accurate. At google large scale refactorings were staged exactly as the article describes rather than being a single big bang commit. The big win wasn’t the atomicity of the change. It was having a single place to track the progress of the refactor as it progressed. A monorepo makes it easier to track since it’s a single commit history and also find and modify all of the needed locations. This gives you a lot of confidence as you make progress. But it does presume (for really large mono repos) that you have to tooling to leverage this consistent view of your codebase at the size of a monorepo.
It’s quite common in LLVM to do this. It annoys me because encouraging this workflow means that you’re encouraging developers to not think about out-of-tree consumers. Just because we can update an LLVM API and the clang / lld / whatever users in a single commit doesn’t mean that it’s a good idea: when we do, we’ve just broken every single out-of-tree user of that API. As a direct result, we encourage downstream consumers to move between LLVM releases infrequently (sometimes only when a release branch is created, sometimes not even until it’s released). As a direct result of that, we get less testing for the trunk branch and suddenly find that a load of things are broken when we do branch a release. Many of these would have been caught quickly if Rust/Pony/Swift/ToyLang47/whatever could run a CI system that did nightly checks with the latest LLVM top of tree and had a good expectation that things wouldn’t break. If APIs that were in a release were deprecated during the period until the next release but not removed until that release had branched, we’d keep source compatibility for most downstream consumers between two releases and they’d all do free testing for us.
Ahaaa. That goes a long way to explaining why LLVM has such a reputation for API breaks across versions. Thank you!
edit: hm, I wonder if you could make this be a thing in 1 single step by running an extra copy of that thing the Rust people use to build every single publicly-available Cargo package, with LLVM nightly linked in?
I’m among those people who repeatedly claim that atomic commits are the one advantage of monorepos. This article tells me it isn’t a strong reason because this ability is rarely or never used. Maybe, I’m not a big fan of monorepos anyways.
I agree with the author that incremental changes should be preferred for risk mitigation. However, what about changes which are not backwards-compatible? If you only change the API provider, then all users are broken. You cannot do this incrementally.
Of course, changes should be backwards compatible. Do Google, Facebook, and Microsoft achieve this? Always backwards compatible?
You’d rewrite that single non-backwards compatible change as a series of backwards compatible ones, followed by a final non-backwards compatible change once nobody is depending on the original behavior any more. I’d expect it to be possible to structure pretty much any change in that manner. Do you have a specific counter-example in mind?
We used to have an internal rendering tool in a separate repo from the app (rendering tests were slow).
The rendering tool ships with the app! There’s no version drift or anything.
When it was a separate repo you’d have one PR with the changes to the renderer, another to the app, you had to cross-reference both (lot easier to check changes when you also see usage changes by consumers), then merge on one side, then update the version on the other side, and only then do you end up with a nice end-to-end change
It’s important to know how to make basically any change backwards compatible, but the costs of doing that compared to the easy change is extremely high and error prone IMO. Especially when you have access to all the potential consumers
That approach definitely works, but it doesn’t come for free. On top of the cost of having to roll out all the intermediate changes in sequence and keep track of when it’s safe to move on, one cost that I see people overlook pretty often is that the temporary backward compatibility code you write to make the gradual transition happen can have bugs that aren’t present in either the starting or ending versions of the code. Worse, people are often disinclined to spend tons of effort writing thorough automated tests for code that’s designed to be thrown away almost immediately.
You don’t have to, at least if you use submodules. You can commit a breaking change to a library, push it, run CI on it (have it build on all supported platforms and run its test suite, and so on). Then you push a commit to each of the projects that consumes the library that atomically updates the submodule and updates all callers. This also reduces the CI load because you can test the library changes and then the library-consumer changes independently, rather than requiring CI to completely pass all tests at once.
I’m working in an embedded field where microcontrollers imply tight resource constraints. That often limits how many abstractions you can introduce for backwards-compatibility.
A simple change could be a type which has “miles” and then “kilometers”. If you extend the type (backwards compatible) it becomes larger. Multiplied by many uses all across the system that can easily blow up to a few kilobytes and cross some limits.
Another example: A type change meant that an adapter had to be introduced between two components where one used the old and the other the new type. Copying a kilobyte of data can already cross a runtime limit.
I do admit that microcontrollers are kinda special here and in other domains the cost of abstractions for backwards-compatibility is usually negligible.
This is accurate. At google large scale refactorings were staged exactly as the article describes rather than being a single big bang commit. The big win wasn’t the atomicity of the change. It was having a single place to track the progress of the refactor as it progressed. A monorepo makes it easier to track since it’s a single commit history and also find and modify all of the needed locations. This gives you a lot of confidence as you make progress. But it does presume (for really large mono repos) that you have to tooling to leverage this consistent view of your codebase at the size of a monorepo.
I am wondering if it is common to do both caller and callee changes in a single atomic commit, when the number of callers is no higher than say 3.
It’s quite common in LLVM to do this. It annoys me because encouraging this workflow means that you’re encouraging developers to not think about out-of-tree consumers. Just because we can update an LLVM API and the clang / lld / whatever users in a single commit doesn’t mean that it’s a good idea: when we do, we’ve just broken every single out-of-tree user of that API. As a direct result, we encourage downstream consumers to move between LLVM releases infrequently (sometimes only when a release branch is created, sometimes not even until it’s released). As a direct result of that, we get less testing for the trunk branch and suddenly find that a load of things are broken when we do branch a release. Many of these would have been caught quickly if Rust/Pony/Swift/ToyLang47/whatever could run a CI system that did nightly checks with the latest LLVM top of tree and had a good expectation that things wouldn’t break. If APIs that were in a release were deprecated during the period until the next release but not removed until that release had branched, we’d keep source compatibility for most downstream consumers between two releases and they’d all do free testing for us.
Ahaaa. That goes a long way to explaining why LLVM has such a reputation for API breaks across versions. Thank you!
edit: hm, I wonder if you could make this be a thing in 1 single step by running an extra copy of that thing the Rust people use to build every single publicly-available Cargo package, with LLVM nightly linked in?
I’m among those people who repeatedly claim that atomic commits are the one advantage of monorepos. This article tells me it isn’t a strong reason because this ability is rarely or never used. Maybe, I’m not a big fan of monorepos anyways.
I agree with the author that incremental changes should be preferred for risk mitigation. However, what about changes which are not backwards-compatible? If you only change the API provider, then all users are broken. You cannot do this incrementally.
Of course, changes should be backwards compatible. Do Google, Facebook, and Microsoft achieve this? Always backwards compatible?
You’d rewrite that single non-backwards compatible change as a series of backwards compatible ones, followed by a final non-backwards compatible change once nobody is depending on the original behavior any more. I’d expect it to be possible to structure pretty much any change in that manner. Do you have a specific counter-example in mind?
We used to have an internal rendering tool in a separate repo from the app (rendering tests were slow).
The rendering tool ships with the app! There’s no version drift or anything.
When it was a separate repo you’d have one PR with the changes to the renderer, another to the app, you had to cross-reference both (lot easier to check changes when you also see usage changes by consumers), then merge on one side, then update the version on the other side, and only then do you end up with a nice end-to-end change
It’s important to know how to make basically any change backwards compatible, but the costs of doing that compared to the easy change is extremely high and error prone IMO. Especially when you have access to all the potential consumers
That approach definitely works, but it doesn’t come for free. On top of the cost of having to roll out all the intermediate changes in sequence and keep track of when it’s safe to move on, one cost that I see people overlook pretty often is that the temporary backward compatibility code you write to make the gradual transition happen can have bugs that aren’t present in either the starting or ending versions of the code. Worse, people are often disinclined to spend tons of effort writing thorough automated tests for code that’s designed to be thrown away almost immediately.
You don’t have to, at least if you use submodules. You can commit a breaking change to a library, push it, run CI on it (have it build on all supported platforms and run its test suite, and so on). Then you push a commit to each of the projects that consumes the library that atomically updates the submodule and updates all callers. This also reduces the CI load because you can test the library changes and then the library-consumer changes independently, rather than requiring CI to completely pass all tests at once.
I’m working in an embedded field where microcontrollers imply tight resource constraints. That often limits how many abstractions you can introduce for backwards-compatibility.
A simple change could be a type which has “miles” and then “kilometers”. If you extend the type (backwards compatible) it becomes larger. Multiplied by many uses all across the system that can easily blow up to a few kilobytes and cross some limits.
Another example: A type change meant that an adapter had to be introduced between two components where one used the old and the other the new type. Copying a kilobyte of data can already cross a runtime limit.
I do admit that microcontrollers are kinda special here and in other domains the cost of abstractions for backwards-compatibility is usually negligible.