1. 38
  1.  

  2. 29

    On top of @ahal’s excellent rebuttal of the VCS side of this, I also disagree with the author’s other main premise: that monorepos encourage entanglement.

    In practice, if you have a really complicated system, and you don’t pay attention, things are gonna get entangled. That’s just how it is. In polyrepo places I’ve been that fall into this hole, everything pins very specific versions, and you end up having to coordinate tons of service upgrades at once. In contrast, at places I’ve been at that use monorepos well, it’s easy to let services upgrade as they need to, because you’ve got well-defined service boundaries that allow for some API skew. This isn’t related to monorepo v. polyrepo; it’s related to whether your engineers have experience handling API migrations at-scale when you don’t control all the consumers.

    Having devs who don’t know how to manage complexity at that scale will hurt you whether you’re using polyrepos or monorepos; having ones who know how to scale can usually handle either way.

    1. 6

      everything pins very specific versions, and you end up having to coordinate tons of service upgrades at once.

      This is Very True.

      1. 4

        I really feel as if this author either hasn’t worked with monorepos, or has only done so when the people he’s working with don’t know how to handle a code base that size. But having devs who don’t know how to manage complexity at that scale go to polyrepos won’t fix the problem; the problem is just the scaling.

        That comes across as a bit of a “no true scotsman” to me. I think your argument is sounder without the addition of that paragraph.

        1. 5

          I definitely didn’t intend it to be a no true Scotsman; I reworded my last sentence to hopefully be clearer about what I’m trying to convey. My intent with that was to say that I’ve seen monorepos work and fail, and polyrepos work and fail, and the deciding factor was the devs, not the number of repositories; hopefully the new phrasing makes that clearer.

        2. 3

          Apparently you’ve only worked at companies who have a level of discipline that is definitely not necessarily going to happen everywhere. The issues that come with multiple repositories definitely exist, but the architecture at the companies I’ve been at who embrace monorepos end up with is a pile of garbage stemming from a poor separation of concerns. Monorepos are to blame for that, and moving away from monorepos has been successful at preventing this in my experience.

          1. 12

            Monorepos are not to blame.

            Polyrepos are sprinkled like micro-services as some sort of magic salve that will solve all your problems, the “new” magic bullet. They are not a substitution for discipline, if you can’t work in a monorepo (or monolith) you aren’t going to magically be able to work with polyrepos and micro-services.

            This is why so so often the “conversion to micro-services” is simply a more politically palatable excuse for “a complete rewrite, but this time we are going to do everything RIGHT” – and they suffer the same fate as most complete rewrites, for all the same reasons. Throwing away knowledge, simplifying beyond reality, trying to solve management failures with tech, etc.

            IMHO, polyrepos require exponentially more discipline, and at more confusing, troubling and costly junctions. From doing an atomic interconnected change, to managing versions and (often invisible) dependencies, to testing all combinations of dependencies for integrations, to simply understanding the state of the world. Not to mention the immaturity of tooling around them versus monorepos. You are trading well solved problems for nebulous unsolved ones.

            1. -1

              Monorepos can be “not to blame” all they want, but moving to what you’re calling “polyrepos” has solved the problem in every case when I’ve tried it.

              There’s nothing “new” about polyrepos, either. The reason monorepos are a thing is because “polyrepos” were how everything worked before monorepos came along - and they solved the problem. The only arguments for monorepos are laziness and bad tooling.

              Microservices can be in “polyrepos” or monorepos, and the argument having been used to bring them into the conversation is a complete strawman.

          2. 1

            I also disagree with the author’s other main premise: that monorepos encourage entanglement.

            This is related to the myth that monoliths encourage entanglement as well.

          3. 14

            Having previously worked on a large monorepo (Twitter, 10s of GiB) I acknowledge the VCS side of this for all @ahal has provided an excellent rebuttal.

            For me monorepos are about coordination and ergonomics. It’s not that you can’t build out tooling to make polyrepo work. You can. But what does it buy you? Is coordinating merges actually a good idea? Is bumping versions across many repos and cutting new CI/CD releases all the time and churning bump PRs really a win over admitting that you have many coupled artifacts and managing them in a single monorepo?

            Monorepos help provide a technical solution to organizational problems of code sharing, and make it easy to build out new projects using shared test and deploy pipelines. In my experience that can be a huge win.

            Joe Armstrong and others have had strong words recently about premature code sharing and library-first coding styles which lead to widespread coupling and make refactoring more difficult. But there really is no substitute for monorepo style large-scale change impact based (integration) testing when you do have pervasive code sharing.

            1. 2

              Don’t you need to build out tooling to make monorepo work as well?

              For example, you need something that analyses which tests to run for a pull request. If you just run all of them all the time your CI is quickly overloaded, isn’t it? In a polyrepo environment, you can simply run all tests for each change in a single repo.

              1. 7

                In a polyrepo environment, you can simply run all tests for each change in a single repo.

                The breakages I run into time and time again are at the interfaces between the modules, which wouldn’t be tested by this strategy. So, now you need to build tests that pull all of the service dependencies of the polyrepo and runs your integration tests across all of them.

                1. 6

                  Yeah, it is frustrating to see answers that appear to ignore reality like that, by such logic when you check in a change to a dependency you just run its unit tests and bam done!

                2. 5

                  For example, you need something that analyses which tests to run for a pull request. If you just run all of them all the time your CI is quickly overloaded, isn’t it?

                  Your CI can and should scale to your complexity. In my experience, this isn’t a major issue and moreover – isn’t dependent on polyrepo or monorepo – either way you need to make sure the system as a whole is still working not just some corner.

                  In a polyrepo environment, you can simply run all tests for each change in a single repo.

                  This assumes I guess 100% test coverage on the edges of each polyrepo, which is an astonishing amount of work, I would argue orders of magnitude more work than scaling your CI infrastructure.


                  Additionally, it is about where the tooling and annoyance lives to a degree. Where you park the complexity. Parking it on the developers machine, requiring large checkouts, indexing tools, yada is well worn and understood territory. Changes are atomic and logical.

                  Moving that complexity into polyrepos requires tools that touch each repo, try to do a lot of fanciness that are far more dangerous IMHO – and when they go bad, are catastrophic to roll back.

                  1. 4

                    You do - but if you use pants or bazel or buck, that’s off the shelf tooling and something you set up once. In polyrepos impact analysis remains impossible, and while yes you can “just” re-run all your tests when you bump versions, that’s far less than ideal. Is re-running all your tests because you bumped a version because unreached code was added to a dependency actually something you want to spend time waiting for?

                    My particular issue with this article is the author’s claim that the version bumping tooling is any more “off the shelf”. I don’t think it is at all, and for a claim like that the author really needed to bring receipts not handwave at the existence of multi-repo refactoring tools especially when the available monorepo build tools are so good and fairly easy to deploy.

                    Furthermore the core algorithms are pretty simple. I wrote katamari in my nights and weekends because I wanted to take a stab at a monorepo tool for Clojure, and the core build implementation is a mere 300 lines including the impact detector you need for incremental testing.

                    Where would you rather spend your time and complexity budgets? I think that spending time on an ongoing basis (coordinating merges, release times being long, review-heavy workflow) miss-prioritizes when the complexity of stronger build tooling is fairly manageable.

                    1. 9

                      In polyrepos impact analysis remains impossible

                      I have worked in tools oriented positions pretty much my whole career: this is absolutely true. Polyrepos are …. not good…. in a corporate setting.

                    2. 2

                      Absolutely - the important consideration is how hard each set of tooling is to build / setup.

                      For small organisations, you’ll typically be adopting existing small-org tooling which largely expects polyrepos.

                      Once you have thousands of engineers, you typically want in-house tooling for security reasons, and it’s much easier to build new tools against a monorepo.

                  2. 20

                    I do agree with the theme of this post: at scale software is complex. Whether you use a monorepo or polyrepos, you’ll need a lot of complicated tooling to make things manageable for your developers.

                    But I want to draw attention to sfink’s excellent rebuttal (full disclosure, he is a colleague of mine).

                    Additionally, I’d like to address the VCS Scalablilty downside. The author’s monorepo experience seems to be with Git. Companies like Google, Facebook (who have two of the largest monorepos in the world) and to a lesser extent Mozilla all use Mercurial for a reason: it scales much better. While I’m not suggesting the path to get there was easy, the work is largely finished and contributed back upstream. So when the author points to Twitter’s perf issues or Microsoft’s need for a VFS, I think it is more a problem related to using the wrong tool for the job than it is something inherently wrong with monorepos.

                    1. 5

                      I was under the impression (possibly mistaken) that Google still used perforce predominantly (or some piper wrapper thing), with a few teams using mercurial or git for various externally visible codebases (android, chrome, etc).

                      1. 10

                        Perforce has been gone for quite a while. Internal devs predominantly use Piper, though an increasing group is using Mercurial to interact with Piper instead of the native Piper tooling. The Mercurial install is a few minor internal things (eg custom auth), evolve and core Mercurial. We’ve been very wary of using things outside of that set, and are working hard to keep our workflow in line with the OSS Mercurial workflow. An example of something we’ve worked to send upstream is hg fix which helps you use a source code formatter (gofmt or clang-format) as you go, and another is the narrow extension which lets you clone only part of a repo instead of the whole thing.

                        Non-internal devs (Chrome, Android, Kubernetes, etc etc) that work outside of Piper are almost exclusively on Git, but in a variety of workflows. Chrome, AIUI is one giant git repo of doom (it’s huge), Android is some number of hundreds (over 700 last I knew?) of git repos, and most other tools are doing more orthodox polyrepo setups, some with Gerrit for review, some with GH Pull Requests, etc.

                        1. 3

                          Thanks for the clarification, sounds like Piper is (and will continue to be) the source of truth while the “rollout” Greg mentioned is in reference to client side tooling. To my original point, Google still seems to have ended up with the right tool for the job in Piper (given the timeline and alternatives when they needed it).

                          1. 2

                            But how does Mercurial interact with Piper? Is Mercurial a “layer” above Piper? Do you have a Mercurial extension that integrates with Piper?

                            1. 3

                              We have a custom server that speaks hg’s wire protocol. Pushing to piper exports to the code review system (to an approximation), pulling brings down the new changes that are relevant to your client.

                              (Handwaving because I’m assuming you don’t want gory narrow-hg details.)

                              1. 2

                                It’s a layer, yeah. My understanding is that when you send out a change, it makes Piper clients for you. It’s just a UX thing on top of Piper, not a technical thing built into it.

                            2. 2

                              I’m fuzzy on the details, but my understanding is that they’re in the middle of some sort of phased Mercurial rollout. So it’s possible only a sample population of their developers are using the Mercurial backend. What I do know is that they are still actively contributing to Mercurial and seem to be moving in that direction for the future.

                              1. 1

                                I wonder if they are using some custom mercurial backend to their internal thing (basically a VFS layer as the author outlined)? It would be interesting to get some first of second hand information on what is actually being used, as people tend to specifically call out Google and Facebook as paragons of monorepos.

                                My feeling is that google/facebook are both huge organizations with lots of custom tooling and systems. /Most/ companies are not google/facebook nor have google/facebook problems.

                                1. 6

                                  This is largely my source (in addition to offline conversations): https://groups.google.com/forum/#!topic/mozilla.dev.version-control/hh8-l0I2b-0

                                  The relevant part is:

                                  Speaking of Google, their Mercurial rollout on the massive Google monorepo continues. Apparently their users are very pleased with Mercurial - so much so that they initially thought their user sentiment numbers were wrong because they were so high! Google’s contribution approach with Mercurial is to upstream as many of their modifications and custom extensions as possible: they seem to want to run a vanilla Mercurial out-of-the-box as possible. Their feature contributions so far have been very well received upstream and they’ve been contributing a number of performance improvements as well. Their contributions should translate to a better Mercurial experience for all.

                                  So at the very least it seems they endeavour to avoid as much custom tooling on top of Mercurial as possible. But like you said, they have Google problems so I imagine they will have at least some.

                                  1. 6

                                    Whoa. This could be the point where Mercurial comes back after falling behind git for years.

                                    Monorepo sounds sexy because Facebook and Google use that. If both use Mercurial and open source their modifications then Mercurial becomes very attractive suddenly.

                                    In git, neither submodules nor LFS are well integrated and generate pain for lots of developers. If Mercurial promises to fix that many will consider to switch.

                                    Sprinkling some Rust into the code base probably helps to seduce some developers as well.

                                    1. 10

                                      Narrow cloning (authored by Google) has been OSS from the very start, and now ships in the hg tarball. If you’ve got need of it, it’s still maturing (and formats can change etc) but it’s already in use by at least 3 companies. I’d be happy to talk to anyone that might want to deploy hg at their company, and can offer at least some help on narrow functionality if that’s needed.

                                    2. 1

                                      Thanks for digging!
                                      Pretty interesting for sure.

                                2. 0

                                  I’m getting verification from someone at Google, but the quick version as I understood it:

                                  Google hasn’t actually used Perforce for a long time. What they had was a Perforce workalike that was largely their own thing. They are now using normal Mercurial.

                                  1. 12

                                    This isn’t true, Google uses Piper (their perforce clone) internally. Devs have the option of using mercurial or git for their personal coding environments, but commits get converted to piper before they land in the canonical monorepo.

                                    1. 2

                                      I’ll ping @durin42; I don’t think I’m misremembering the discussion, but I may have misunderstood either the current state or implementation details.

                                3. 3

                                  What is it about git that makes it a poor choice for very large repos?

                                  What does Mercurial and Perforce do differently?

                                  1. 2

                                    In addition to the article @arp242 linked, this post goes into a bit more technical detail. Tl;dr, it’s largely due to how data is stored in each. Ease of contribution is another reason (scaling Git shouldn’t be impossible, but for one reason or another no one has attempted it yet).

                                    1. 1

                                      Microsoft has a 300GB git repo. They built a virtual file system to make it work.

                                      1. 1

                                        True, but in the scalability section of the article the author argues that the need for a VFS is proof that monorepos don’t scale. So I think most of this thread is centered around proving that monorepos can scale without the need for a VFS.

                                        I agree that a VFS is a perfectly valid solution if at the end of the day the developers using the system can’t tell the difference.

                                    2. 2

                                      Facebook wrote about Scaling Mercurial at Facebook back in 2014:

                                      After much deliberation, we concluded that Git’s internals would be difficult to work with for an ambitious scaling project. [..] Importantly, it [mercurial] is written mostly in clean, modular Python (with some native code for hot paths), making it deeply extensible.

                                      It’s a great example of how applications in a slower language can be made better performing than applications in a faster language, just because it’s so much easier to understand and optimize.

                                  2. 18

                                    What a curious article. Let’s start with the style, such as calling some of the (perceived) advantages of a monorepo a “lie”. Welp, guess I’m a liar 🤷‍ Good way to have a conversation, buddy. Based on this article I’d say that working at Lyft will be as much fun as working at Uber.

                                    Anyway, we take a deep breath and continue, and it seems that everything is just handwaved away.

                                    Our organisation has about 25 Go applications, supported by about 20 common dependency packages. For example, we have packages log, database, cache, etc. Rolling out updates to a dependency organisation-wide is hard, even for compatible changes. I need to update 25 apps, make PRs for 25 apps. It’s doable, but a lot of work. I expect that we’ll have 50 Go applications before the year is out.

                                    Monorepos exist exactly to solve problems like this. These problems are real, and can’t just be handwaved away. Yes, I can (and have) written tools to deal with this to some extent, but it’s hard to get this right, and in the end I’ve still got 25 PRs to juggle with. The author is correct that tooling for monorepos also needs to be written, but it seems to me that that tooling will be a lot simpler and easier to maintain (Go already does good caching of builds and tests out of the box, so we just have to deal with deploys). in particular, I find it’s very difficult to maintain any sense of “overview” of stuff because everything is scattered over 25 PRs.

                                    Note that the total size of our codebase isn’t even that large. It’s just distributed over dozens of repos.

                                    It’s still a difficult problem, and there is no “one size fits all” solution. If our organisation would still have just one product in Go (as we started out three years ago) then the current polyrepo approach would continue to suffice. It still worked mostly okay when we expanded to two and three products. But now that we’ve got five products (and probably more on the way in the future) it’s getting harder and harder to manage things. I can write increasingly more advanced tooling, but that’s not really something I’m looking forwards to.

                                    I’m not sure how to solve it yet; for us, I think the best solution will be to consolidate our 20 dependency packages in to a single one and consolidate all services of different applications in their own repo, so we’ll end up having 6 repos.

                                    Either way, the problems are real, and people who look towards monorepos aren’t all stupid or liars.

                                    1. 4

                                      I would imagine that if all you use is Go, and nothing much else, then I would image that you are in the monorepo “sweet spot” (especially if your repo size isn’t enormous). From what I understand, Go was more or less designed around the google internal monorepo workflow. At least until Go 1.10/1.11 or so (6 years? after Go 1.0).

                                      It makes me wonder…

                                      • Are there other languages that seem to make monorepo style repos easier?
                                      • Are monorepos harder/worse if you have many apps written in multiple disparate languages?
                                      1. 7

                                        Main issue with monorepos (imo) is that lots of existing tools assume you are not using them (eg: github webhooks, CI providers, VCS (support for partial worktrees), etc). Not an issue at google scale where such tools are managed (or built) in-house.

                                        1. 3

                                          This point isn’t made enough in the monorepo debate. The cost of a monorepo isn’t just the size of the checkout, it’s also all of the tooling you loose by using something non-standard. TFA mentioned some of it, but even things like git log become problematic.

                                          1. 2

                                            Is there a middleground that scopes the tooling better? What I mean is, keep your web app and related backend services in their monorepo assuming they aren’t built on drastically different platforms and you desire standardisation and alignment. Then keep your mobile apps in separate repos, unless you are using some cross-platform framework which permits a mobile monorepo. You get the benefits of the monorepo for what is possibly a growing set of services that need to refactored together while not cluttering git log et al with completely unrelated changes.

                                            1. 2

                                              Sort of. What really matters is whether you end up with a set of tools that work effectively. For small organizations, that means polyrepos, since you don’t often have to deal with cross-cutting concerns and you don’t want to build / self-host tools.

                                              Once you grow to be a large organization, you start frequently making changes which require release coordination, and you have budget to setup tools to meet your needs.

                                        2. 4

                                          Interesting, Go in my experience is one of the places I have seen the most extreme polyrepo/microservice setups. I helped a small shop of 2 devs with 50+ repos. One of the devs was a new hire…

                                        3. 0

                                          Rolling out updates to a dependency organisation-wide is hard, even for compatible changes. I need to update 25 apps, make PRs for 25 apps.

                                          What exactly is the concern here? Project ownership within an org? I fail to see how monorepo is different from having commit access to all the repos for everyone. PRs to upstream externally? Doesn’t make a difference either.

                                          1. 3

                                            The concern is that it’s time-consuming and clumsy to push updates. If I update e.g. the database package I will need to update that for 25 individual apps, and them create and merge 25 individual PRs.

                                            1. 3

                                              The monorepo helps with this issue, but it can also be a bit insidious. The dependency is a real one and it’s one that any updates to need to be tested. It’s easier to push the update to all 25 apps in a monorepo, but it also can tend to allow developers to make updates without making sure the changes are safe everywhere.

                                              Explicit dependencies with a single line update to each module file can be a forcing function for testing.

                                              1. 2

                                                but it also can tend to allow developers to make updates without making sure the changes are safe everywhere

                                                The Google solution is by pushing the checking of the safety of a change onto the team consuming it, not the one creating it.

                                                Changes are created using Rosie, and small commits created with a review from a best guess as to who owns the code. Some Rosie changes wait for all people to accept. Some don’t, and in general I’ve been seeing more of that. Rosie changes generally assume that if your tests pass, the change is safe. If a change is made and something got broke in your product, your unit tests needed to be better. If that break made it to staging, your integration tests needed to be better. If something got to production, you really have bigger problems.

                                                I generally like this solution. I have a very strong belief that during a refactor, it is not the responsibility of the refactor author to prove to you that it works for you. It’s up to you to prove that it doesn’t via your own testing. I think this applies equally to tiny changes in your own team up to gigantic monorepo changes.

                                              2. 1

                                                Assuming the update doesn’t contain breaking changes, shouldn’t this just happen in your CI/CD pipeline? And if it does introduce breaking changes, aren’t you going to need to update 25 individual apps anyway?

                                                1. 4

                                                  aren’t you going to need to update 25 individual apps anyway?

                                                  The breaking change could be a rename, or the addition of a parameter, or something small that doesn’t require careful modifications to 25 different applications. It might even be scriptable. Compare the effort of making said changes in one repo vs 25 repos and making a PR for each such change.

                                                  Now, maybe this just changes the threshold at which you make breaking changes, since the cost of fixing downstream is high. But there are trade offs there too.

                                                  I truthfully don’t understand why we’re trying to wave away the difference in the effort required to make 25 PRs vs 1 PR. Frankly, in the way I conceptualize it, you’d be lucky if you even knew that 25 PRs were all you needed. Unless you have good tooling to tell you who all your downstream consumers are, that might not be the case at all!

                                                  1. 1

                                                    Here’s the thing: I shouldn’t need to know that there are 25PRs that have to be sent, or even 25 apps that need to be updated. That’s a dependency management problem, and that lives in my CI/CD pipeline. Each dependent should know which version(s) it can accept. If I make any breaking changes, I should make sure I alter the versioning in such a way that older dependents don’t try and use the new version. If I need them to use my new version, then I have to explicitly deprecate it.

                                                    I’ve worked in monorepos with multiple dependents all linking back to a single dependency, and marshalling the requirements of each of those dependents with the lifecycle of the dependency was just hell on Earth. If I’m working on the dependency, I don’t want to be responsible for the dependents at the same time. I should be able to mutate each on totally independent cycles. Changes in one shouldn’t ever require changes in the other, unless I’m explicitly deprecating the version of the dependency one dependent needs.

                                                    I don’t think VCS is the right place to do dependency management.

                                                    1. 3

                                                      Round and round we go. You’ve just traded one problem for another. Instead of 25 repos needing to be updated, you now might have 25 repos using completely different versions of your internal libraries.

                                                      I don’t want to be responsible for the dependents at the same time.

                                                      I mean, this is exactly the benefit of monorepos. If that doesn’t help your workflow, then monorepos ain’t gunna fly. One example where I know this doesn’t work is in a very decentralized ecosystem, like FOSS.

                                                      If you aren’t responsible for your dependents, then someone else will be. Five breaking changes and six months later, I feel bad for the poor sap that needs to go through the code migration to address each of the five breaking changes that you’ve now completely forgotten about just to add a new feature to that dependency. I mean sure, if that’s what your organization requires (like FOSS does), then you have to suck it up and do it. Otherwise, no, I don’t actually want to apply dependency management to every little thing.

                                                      Your complaints about conflating VCS and dependency management ring hollow to me.

                                                      1. 1

                                                        I mean, again, this arises from personal experience: I’ve worked on a codebase where a dependency was linked via source control. It was an absolute nightmare, and based on that experience, I reached this conclusion: dependencies are their own product.

                                                        I don’t think this is adding “dependency management to every little thing”, because dependency management is like CI: it’s a thing you should be doing all the time! It’s not part of the individual products, it’s part of the process. Running a self-hosted dependency resolver is like running a self-hosted build server.

                                                        And yes, different products might be using different versions of your libraries. Ideally, nobody pinned to a specific minor release. That’s an anti-pattern. Ideally, you carefully version known breaking changes. Ideally, your CI suite is robust enough that regressions never make it into production. I just don’t see how different versions of your library being in use is a problem. Why on Earth would I want to go to every product that uses the library and update it, excepting show-stopping, production-critical bugs? If it’s just features and performance, there’s no point. Let them use the old version.

                                                        1. 2

                                                          You didn’t really respond to this point:

                                                          Five breaking changes and six months later, I feel bad for the poor sap that needs to go through the code migration to address each of the five breaking changes that you’ve now completely forgotten about just to add a new feature to that dependency.

                                                          You ask why it’s a problem to have a bunch of different copies of your internal libraries everywhere? Because it’s legacy code. At some point, someone will have to migrate its dependents when you add a new feature. But the point at which that happens can be delayed indefinitely until the very moment at which it is required to happen. But at that point, the library may have already gone through 3 refactorings and several breaking changes. Instead of front-loading the migration of dependents as that happens by the person making the changes, you now effectively have dependents using legacy code. Subsequent updates to those dependents now potentially fall on the shoulders of someone else, and it introduces surprise yak shaves. That someone else then needs to go through and apply a migration to their code if they want to use an updated version of the library that has seen several breaking changes. That person then needs to understand the breaking changes and apply them to their dependent. If all goes well, maybe this is a painless process. But what if the migration in the library resulted in reduced functionality? Or if the API made something impossible that you were relying on? It’s a classic example of someone not understanding all of the use cases of their library and accidentally removing functionality from users of their library. Happens all the time. Now that person who is trying to use your new code needs to go and talk to you to figure out whether the library can be modified to support original functionality. You stare at them blankly for several seconds as you try to recall what it is you did 6 months ago and what motivated it. But all of that would have been avoided if you were forced to go fix the dependent in the first place.

                                                          Like I said, your situation might require one to do this. As I said above, which you seem to have completely ignored, FOSS is one such example of this. It’s decentralized, so you can’t realistically fix all dependents. It’s not feasible. But in a closed ecosystem inside a monorepo, your build doesn’t pass unless all dependents are fixed. Everything moves forward, code migrations are front loaded and nobody needs to spend any time being surprised by a necessary code migration.

                                                          I experience both of these approaches to development. With a monorepo at work and lots of participation in FOSS. In the FOSS world, the above happens all the time exactly because we have a decentralized system of libraries that are each individually versioned, all supported by semver. It’s a great thing, but it’s super costly, yet necessary.

                                                          Dependency management with explicit versioning is a wonderful tool, but it is costly to assign versions to things. Sometimes it’s required. If so, then great, do it. But it is most certainly not something that you “just do” like you do CI. Versioning requires some judgment about the proper granularity at which you apply it. Do you apply it to every single module? Every package? Just third party dependencies? You must have varying answers to these and there must be some process you follow that says when something should be independently versioned. All I’m saying is that if you can get away with it, it’s cheaper to make that granularity as coarse as possible.

                                          2. 7

                                            This really reads to me as more of a condemnation of SOA and excessively large teams than any real shortcoming of monorepos.

                                            I’m also kinda curious how many teams that write software really are that large.

                                            1. 1

                                              Yeah, I just stay far away from organizations that have 100+ full time coders churning out lines of code high on energy drinks and psytrance until there’s so much code Git can’t even handle it.

                                            2. 7

                                              At Airbnb, there were two camps: the Ruby people liked polyrepos, and the Java people liked monorepos. Every Ruby service had its own git repo, and there was one huge monorepo for Java projects called “Treehouse.”

                                              We spent an inordinate amount of time supporting Treehouse, and built tooling similar to what’s described in the post — for example, I built a tool that analyzes Gradle dependencies and automatically manages git sparse checkouts, so that IntelliJ didn’t explode when it tried to index all the projects (we had an ex-Jetbrains engineer who tried to get IntelliJ to reliably only index the projects you were working on, but managing IntelliJ instead of the filesystem was pretty flaky and never Just Worked at our scale). We built all sorts of special casing into our CI to make operating on Treehouse faster, but still couldn’t reliably hit our performance targets. And because code sharing via direct source linking was standard — that’s one of the benefits of monorepos — dozens of projects were tangled together, and touching a single file in any of them meant potentially rebuilding all of them (sometimes it didn’t, but you needed to know a lot to know whether it would or it wouldn’t).

                                              We spent basically zero time supporting the Ruby polyrepos, and heard fewer complaints than from Treehouse devs — generally zero.

                                              There were benefits from Treehouse, but personally I would never advocate a startup doing a monorepo. You have more important problems to work on. YMMV depending on how much of a startup you are.

                                              1. 8

                                                I would never advocate a startup doing a polyrepo. For exactly the same reasons. “You have more important problems to work on”. I have seen the horror shows first hand of two man startups with 50+ repos trying to roll out a simple change. At a certain scale you might be able to manage the non-negotiable costs of a polyrepo, but at least with a monorepo, they are negotiable. Small app + db scaffs + libs you wrote in one monorepo with a handful of devs is an order of magnitude simpler and more flexible than polyrepo in terms of stuff YOU DO NOT HAVE TO DO.

                                                1. 1

                                                  I think this is more of a microservices vs monolith argument: I agree (early) startups should stay away from microservices, where you’d deploy 50 things. One repo, one monolith, a database and maybe a cache: much simpler. Don’t bother with handling multiple of anything.

                                                  At the point where you’re actually trying to split off separate services, though, you’re faced with the choice of monorepo-of-multiple-services versus repo per service. Just like monoliths are better for early startups but would be hellish at say, Google scale, repo-per-service is a lot simpler to manage for a small company than a monorepo of many services, despite monorepos being better at Google scale. There are benefits to monorepos, but you need to staff a team dedicated to your monorepo, and it takes a large engineering team for that kind of productivity investment to be worth it (and if you do the monorepo but don’t staff the team appropriately, you actually lose significant productivity as your open-source or commercially-available tools choke on the repo size). The worst case for repo-per-service is commits to upgrade shared libs are painful; the worst case for monorepos is every commit is painful.

                                                  Once you have enough engineers where it makes sense to pay people to work on VCS tooling because the productivity improvements will pay for themselves even when compared against the opportunity cost of having those engineers fix bugs, improve performance, and ship features for whatever your business gets paid for, hire engineers to work on monorepo tooling. Until then, don’t; polyrepos are sometimes annoying but workable, and bad monorepos are worse than bad polyrepos.

                                                  1. 1

                                                    In recent years narrow clones have become more viable. So I suspect what you say is still true to a degree, but I think monorepos are now a much better choice for a microservice architecture than they were a few years ago.

                                                    1. 1

                                                      Git doesn’t have narrow clones. There are two features you might be referring to:

                                                      • “Shallow clones.” This clones the entire repo’s content, but leaves out (a configurable amount of) its history. This can lead to faster from-scratch clones due to downloading less data and space savings on disk in terms of history data, but doesn’t help you with tools like IntelliJ that choke on the size of the repo contents. It’s often useful in CI, though (and we used it for that).
                                                      • “Sparse checkouts.” This still requires keeping the entire repo contents on disk, but they’re hidden inside the .git directory. You manually configure which files are unpacked to their intended locations on disk, but you still need to keep their data stored; it’s just hidden. This doesn’t provide space savings, but can help tools like IntelliJ. There’s a pretty huge caveat here, though: git only cares about files, and has no idea what files you need available in order for e.g. a build to pass. In a monorepo you’ll often have direct source links all over the place, meaning you’ll need to build tooling to figure out which files are necessary for which projects and which files can be hidden (and I built such a tool at Airbnb). It’s not impossible, but there are a lot of gotchas and corner cases (for example, Gradle can tell you what source files it needs to build a project… But not which .gradle files it needs), and it has to be done per-language/build tool/package manager. Also, remember that you have to update the list of files every time you pull down new commits, because someone could’ve added new dependencies and you need to walk the graph again (hope that’s fast)…

                                                      Having worked on this pretty recently: it’s unfortunately not free, and less of the tooling exists in the open source or commercially available world than you’d hope.

                                                      1. 1

                                                        Git does technically support narrow clones (which they call partial clones) as of this year. However only the client pieces are implemented, so it’ll only work when cloning from another local repo: https://github.com/git/git/blob/master/Documentation/technical/partial-clone.txt

                                                        I don’t believe any of the hosting providers support it yet either. Narrow clones are ready for use in Mercurial though (which is the better option for a large monorepo anyway).

                                                        1. 1

                                                          Yeah, Mercurial is the best option if you’re going with a monorepo. Although that comes along with its own downsides, since hg isn’t particularly widely adopted compared to git. For example: you can rule out using Github or Gitlab, since neither provide Mercurial server support (and while you can use a Mercurial client to talk to a git server, the git server still won’t support narrow clones, so monorepos are still painful).

                                                          From an off the shelf tooling perspective, polyrepos are simpler. If you’re willing/able to invest in bespoke VCS tooling, monorepos are nice.

                                              2. 5

                                                I’m not a big fan of clear ownership on code - that dramatically reduces your ‘bus factor’, and I’ve seen projects get neglected in the polyrepo linux world purely because they had one maintainer and he got tired of it.

                                                1. 5

                                                  VCS is really something I’ve been involved with my whole career - multiple companies, multiple VCS, SCM in general.

                                                  Monorepos are the right way for a system that is shared in a company. Not the easy way per se. But, the right way. Everything becomes dramatically easier when there is a single Version of your system. Refactors are dramatically easier. Documentation is in a single place. Searchability is dramatically easier.

                                                  Git doesn’t work as well here though. It’s not designed to - it’s not even designed as a corporate VCS. Something nominally similar to SVN is what most developers tend to want in their corporate workflow (without SVN annoyances).

                                                  It’s not incorrect to say that the tooling isn’t there for monorepos right now- that’s true. Open source promotes the idea of polyrepos because of all the small projects and libraries. So the tooling flows towards what’s easier.

                                                  But Better Is Better - Worse is not Better, Worse just happens to be easier to get started with.

                                                  1. 7

                                                    Question for the poly repo people: how many repos should openbsd src be split into?

                                                    1. 8

                                                      I think it’s fine? To first order repos should contain code that needs to be deployed as a unit. If a piece of code needs to be shared between multiple deployable units that have independent deploy cycles, extract it into its own repo and use submodules. By these fairly obvious rules of thumb I just came up with, it makes sense for an OS kernel to be in a single repo. It also makes sense for the userland to be separate. I assume this is why the OpenBSD ports tree is in a separate repo ;)

                                                      (I worked at Google in the past. I’m not super anti-monorepo but never really saw much benefit from it. I think Google in particular has a monorepo because it predates git. They just turned it into a massive exercise in rationalization. No idea what Facebook and Twitter’s excuses are. Key ex-googler in the right place at the right time, maybe?)

                                                      1. 1

                                                        It also makes sense for the userland to be separate. I assume this is why the OpenBSD ports tree is in a separate repo ;)

                                                        OpenBSD’s userspace is in the src repo, not in the ports repo.

                                                      2. 4

                                                        Isn’t CVS kind of the equivalent of every file being its own repo, with some built in wrapping tooling?

                                                        1. 1

                                                          I have 0 problem with monorepos. The problem isbadly organized code project.

                                                          OpenBSD along with other operating systems keep code in awell-defined location (binaries-related code in bin, kernel code in sys…), so you do not spend so much time searching for the code, or wondering where to put the new (if more code is to be added at all, rather than just bugfixes).

                                                        2. 3

                                                          Worth citing the most comprehensive arguments of the opposition. Maybe Dan Luu?

                                                          1. 1

                                                            I’ve never had a chance to apply this at scale, but IMO the ideal solution to dependency management at scale is using nix to inject dependencies into packages. In this scheme, the packages don’t try to find their dependencies by whatever means, they receive it from the outside in their default.nix. Then a you have a single place where you wire all the dependencies to each other. Note that I haven’t mentioned mono-poly repo at all, because trying to organize your dependencies using VCS is a dead-end however you approach it.

                                                            All of that said, I find it convenient to to have that single place where you do all the wiring be a git mother-repo with only .nix files and submodules in it gathering all the code you have. The individual submodules may or may not contain multiple nix packages in them BTW. Then your wiring .nix code finds everything immediately under the repo root and if you need to work on multiple projects at the same you have everything in one place. “Feature branches” in this combo-repo approach become branches of the mother repo, bringing together a different set of commits of the child repos.

                                                            EDIT: What I described is not novel at all, I haven’t applied this at scale, but NixOS uses the model in the first paragraph (i.e no submodule stuff) to manage an entire linux distro better than anyone else.

                                                            1. 1

                                                              Agreeing to most of it, I still have to say that not all VCS are built equal:

                                                              https://www.plasticscm.com/games/performance/performance-results-of-plastic-scm