What the author doesn’t acknowledge is these massive repos have a lot of tooling behind them to make them work. Google and Facebook have whole teams devoted to these tasks, afaik, and it’s non-trivial to get everything to work smoothly. If you cannot afford to get all the tooling needed, monolithic repos might be more problematic than separate repos.
Well, I guess I have to point out something.
Any org of those sizes need build tool teams. That’s a cost of doing business for any org of more than a few handfuls of engineers and something most companies underspend on by a great deal.
So, you might respond, they are being made more necessary by building inside of and for a monorepo. Okay, well, let’s talk about that some.
Building common tools for code review, dependency building, distributed testing, and so on is important. It provides a base in which all of your engineers can communicate and move freely between project to project without having to constantly grapple with whatever strange build is broken and so on.
If anything, small (or large!) companies using a monorepo will have an easier time creating good build tools inside of a monorepo. Those build tools only have to be created, distributed, and maintained in one repo with one common way of doing things, instead of having to integrate across many repos with varying levels of support for the common code review, dependency building, and generation tools. That means less overhead on communication, on integration, on convincing folks to please migrate to a new, better version of a tool. Instead, you add it to the repo, and can easily, slowly move folks to it because the tooling was consistent. Communication gets so hard so much faster than most any software scaling problem and the tools we build have to reflect that.
Building tools for a monorepo instead of a graph of repos is also easier. It’s easier to build stuff for a monorepo because you cut down the state space you have to operate in. Instead of a distributed systems problem of repos in varying states of their histories, you get one repo with one totally ordered history. A lot of the folks who build tools to operate on many repos (dependency awareness, cross-repo checkins, subtrees) are usually just re-building a monorepo but with even more work required to traverse the graph of, not just files on disk, but of repos at various states.
Now, of course, I and the author both agree that the current state of distributed version control systems make it really hard to scale out a big monorepo but that’s not where most of a build tool team’s time is spent! In fact, at Google, they just used perforce (a non-DVCS) for a heck of a long time. For small orgs, sticking with git or mercurial works pretty well for a really long dang time because, like lots of problems at small scale, you can do pretty much whatever isn’t the dumbest thing and it works. But it is a total bummer to have to think about at all and I look forward to narrow clones being more of a thing!
So, yeah, orgs have build tool teams, and that’s because you need them no matter what! And building tools in and for a monorepo means a huge reduction in problem space. That reduction in the problem space simplifies the actual creation of the tools and their integration into the org. Which means you can spend less time, proportionally, on those build tool teams by setting them up for success and spend more time making great products.
For the most part I agree with what you said, but you are responding to a point I didn’t make. The author’s point is that monorepos are good. The end. My point is that they can be good but you will have to invest in them. The author is missing an important part.
It’s easier to build stuff for a monorepo because you cut down the state space you have to operate in. Instead of a distributed systems problem of repos in varying states of their histories
I disagree with this. With multirepos you can have a repository that is pinning to versions of other repos, which makes the state space only its history. In a monorepo things can change out from under your feet. It takes a fair amount of tooling to give guarantees. I’ve worked at places with monorepos that didn’t have that tooling and it was painful. My team moved to a multirepo solution with pinned dependencies and no extra tooling and it made our issues tractible. It wasn’t solved but you at least could trust the world wasn’t going to fall out from under at any moment.
I agree it’s quite complex, and I believe the author would also agree: he now works on Mozilla’s Developer Services team, which includes scaling Mercurial, improving code review, etc.
Other posts on his blog describe some of the work being done, much of which is not really specific to Mozilla, so hopefully it could be reused by other organizations.
I kinda hate monolithic repositories, because a few things start to happen (in no particular order of annoyance):
People start checking in binary artifacts next to things that sorta/kinda need them, but this can cause annoying versioning problems. Logic is usually “but but but this project only needs that dll, why do i want to do a full build?”
It can make it very difficult to section off access to only certain parts of the codebase, say for external SDK development, intern work, or third-party development work. Suddenly to let people work on a little thing they have to have the whole thing.
It encourages people to write sprawling applications with goofy interdependencies. Say, for example, a pseudo-json description of something that is precompiled by another tool and then generates source code for three other projects. This would have been rethought had everything not just been living together, but since it already was, why not?
It encourages people to share code between projects in ways that can prove nasty. Most people are not smart about what code to share, and instead build up spiderlike webs of interdependency that are a pain in the ass. Had repository been broken up, it would’ve made it more annoying to do so.
One big repo for, say, a game or a single desktop application, might be okay. Unfortunately, a single repo with code spanning several projects and systems and services is probably going to end in tears.
A lot of the issues you mention I’ve seen happen with smaller, separate repositories as well, especially the problem with interdependencies and code sharing. Those problems are better server by code review.
As for sectioning things off, Subversion and Perforce are very good at that. I’ve worked in various situations where I would check out a directory structure and some of it is just not obtained because I don’t have access to it (it was usually proprietary 3rd party drivers).
I’ve done the Subversion partitioning thing too…it was handy for keeping art assets from killing new code checkouts. :)
I agree that those problems are also solvable by code review, but adding a certain amount of friction at the tooling layer helps.
There’s also a common practice of, especially in academic environments or in academics-pretending-to-be-startup-folks of saying “Well, it’s simpler to just have it all in one place”, and these same people are…difficult…to do code reviews with.