At this point, I’m convinced that the monorepo vs multi-repo world views simply highlight a shortcoming of our existing VCS tools. We shouldn’t have to decide on our global API boundary versioning strategy up front, and if we do, we should be able to change our minds easily. This goes hand-in-hand with DVCS' shortcomings for asset-heavy use cases, like game development.
This goes hand-in-hand with DVCS' shortcomings for asset-heavy use cases, like game development.
I don’t understand this part.
Graphic designers and musicians don’t have a “merge” concept in their tooling. What does that even mean?
You could look at git annex or git lfs, or hg largefiles or hg snapextension.
I’m familiar with all of those git/hg extensions, but even with those, they don’t solve the problem you point out: What does merge mean for a 3D model? Or a level? That’s why game studios tend to prefer centralized version control systems with locking, like Perforce.
I think you misunderstood me. It’s my experience “3d models” and “levels” work just fine with git annex. I imagine the hg alternative is just fine as well.
Meanwhile Perforce is completely unusable from an airplane, and I don’t see the point of locking: That it’s more popular than git annex is interesting, but to say it’s indicative of a shortcoming of DVCS seems an awful big stretch.
¯\(ツ)/¯ Different folks have different use cases.
For one thing, when you’ve got a few terabytes of source art assets to work with, an airplane is out of the question anyway.
For another, locking is frequently quite useful as a communication tool in some situtations. “Oh, you’re working on that? OK - I’ll work on something else so we don’t step on each other’s toes.” Assuming you can lock at a fine enough grain and that locks are “advisory”.
I completely agree. VCS’s should be able to account for the workflows of both groups and make it easy to switch between.
I think almost everyone will benefit from organising their code this way.
No. Sweeping generalizations like this don’t help anyone.
It all depends on your development process, and which VCS you choose to support this development process, and what this VCS was designed to handle.
For example, at the ASF we have a big monorepo (svn.apache.org) which works because it’s an SVN repository so each project only needs one top-level directory, beneath which the project models its development process using SVN-style branches and tags (i.e. copies).
And there are ASF projects which use git, and these live in distinct repositories (git.apache.org, which also mirrors projects from SVN). Because of how branches are modeled in git that’s the only reasonable option.
No. Sweeping generalizations like this don’t help anyone.
“Sweeping generalisation” is an interesting term for the conclusion to an argument that starts from widely applicable problems to reach a generally applicable solution.
And by interesting I mean “wrong”.
There may be valid reasons someone can’t do it, or that it would be too costly to do it, but I’m pretty confident in saying that almost everyone who can do it would benefit from it.
Also Apache are pretty out of the scope of the scenario I was talking about, as might have been hinted at by the title including the phrase “All your company’s projects”
I’ll try to put this as politely as possible: I’m a little tired of people with just enough experience to be dangerous making regal declarations about what the rest of us must or must not do.
So what happens if you have two projects, A and B, that both depend on a third project, C? If C is separate and versioned, you can peg A at 1.0.3 while B continues to 1.1.0, and so on, managing the dependency just as you would any other dependency. If you are working in a monorepo, it seems to me that there’d be no easy way to manage this case, meaning that one of a few things is going to happen:
At work, we (years ago) started out with this philosophy. Sadly, it didn’t pan out. In practice, the chore of upgrading many separate repositories when you made a change far outweighed the theoretical benefits of being able to run different versions. I say “theoretical” because I don’t know that we ever needed that in any serious way. Perhaps if you absolutely need the ability to run different versions of your packages, you would want to pay the human cost of churning through repos to update dependencies. In practice for us, it wasn’t worth it.
We’re in the process of migrating all our Scala code to a monorepo for this reason.
I think the idea is that if there are any changes to C that imply changes to A and B, then those changes need to happen all at once. Preferably in a single commit. CI for all packages runs on every commit, to catch and prevent cross-package breakage.
That’s case 3, and it works fine in theory; however, what if project B has constraints that make deployment difficult, or if no resources are currently allocated to updating it with any changes that are necessary? “Just do it all at once” isn’t necessarily a viable strategy.
It becomes the responsibility of the maintainer of A to make the fix in C and B if they need changes to be made to C, it’s unmaintained, and their changes break B. That might mean working directly with the teams involved in the breakage, or just going in on her own and fixing it. Hopefully you have tests that will help mitigate regressions from this refactor.
Ever worked in a web agency? It is quite common to have around tens of old projects with unknown future prospects. From experience you know some of them will see future work, but you don’t know which ones and margins are not nearly big enough to invest time in possibly dead projects.
Depending on different project versions across the company creates a lot more trouble long term than time saved in the short term. I guarantee if some other team at my company complained they depended on an old version of my library I would tell them to quit their tomfoolery and develop like adults. New commits are a small hassle, but they are also bug fixes, security fixes, or something else important enough that I bothered to commit it!
Depends on a company. I think this article makes an unstated assumption about what kind of work the company does. For example, in web agencies most projects would be fairly short and once they ended work on them may be limited to security updates. It almost never happens that a client wants to pay for an upgrade to a newer version of a framework until they want to make a more substantial change to a website.
In an ideal world they would keep all the projects up to date with newer versions. In real world that would be prohibitively expensive as you never know which client will actually return with more work and number of projects goes in few tens if not over hundred.
I also have doubts if adding a new feature to a library at the same time as to a client code is a good idea, but that is just a minor issue with the article as there are doubtless occasions where you would like to make changes to multiple projects at the same time.
It doesn’t necessary have to be long term; it should just fit into the cadence of both primary projects. If B can file a ticket to “Upgrade C to 1.1.0,” then it can be prioritized and fit into their cadence. In situations where the company is working with real clients (i.e. not a social network or advertising giant), this means that client features can be finished, then the upgrade can be taken care of at its proper time. In a monorepo, if B doesn’t have time to upgrade C, but A does, then it simply will not happen if B has higher priorities. A will end up writing their changes into their own repo, and the shared “libraries” will atrophy and die.
As an adult myself, I often don’t want to take any random update to a dependency just because. Software doesn’t actually rot just because it’s old, and changes made to software often result in new bugs. Everything is a trade-off, and what makes sense to you doesn’t have to be the best course of action for everybody.
This might work file if the company only has repositories for their own projects. This stops working if:
Also, the comparison with the Linux kernel is somewhat false. As far as I know, the Linux kernel only includes source code files while a web application or mobile application will always have various binary assets like images that will significantly increase the repository size.
This stops working if:
You need to browse the repository history for any reason
Why? With git: git log [-p] <mysubpath>
git log [-p] <mysubpath>
You need to provide third-party access to an individual project (a possible client or outsourcing company)
With git yes, this gets complicated. I suppose a synced git subtree is the only way, which would make things a bit icky, but it would retain the advantages for those who are able to use the whole repo. How icky the sync is depends on how often you get merge conflicts, how often the people in the extracted subtree break things for the monorepo people, what process you have for merging back changes from the extracted repo, etc.
Like DRMacIver says in the article, if you want to liberate parts of your code and maintain externally, that runs into the same thing.
I wonder how often monorepo people vendor external dependencies and how well that works. That’s sort of the same thing in reverse. That gets easier though, if you don’t patch the dependency – just import with git subtree and off you go.
You use version control for things like the site’s Wordpress theme
You probably don’t want to pull the entire site source code into your Wordpress instance, but then you probably shouldn’t have just git pull as your deployment method anyway.
This assumes that all developers are comfortable with the command line version of Git. There are a lot of developers that would rather use GUI tools like Tower or SourceTree. I’m not sure how easy it is to do this with those tools.
I’ve rarely seen Wordpress developers have some sort of deployment method other than copy the theme files over. Regardless, this was also a matter of access rights as you probably wouldn’t want to allow a developer of a Wordpress theme to have access to your entire source code. This also applies for contractors that you’d hire to work on a specific project. Yes, you can use git subtree but I’ve seen a lot of resistance to using git subtree probably due to some bad experiences with git submodule.
I’ve rarely seen Wordpress developers have some sort of deployment method other than copy the theme files over.
I’m thinking that if you’re working in an organization that is considering going mono, there’s probably a tooling team rather than your designer doing deploys. If you are five people in a room, you probably don’t have problems coordinating non-backward-compatible changes in multiple repos.
The access thing is a problem when going mono repo with git.
I’ve seen a lot of resistance to using git subtree probably due to some bad experiences with git submodule.
Yeah, I’ve seen some confusion over this. People need to understand that the two have entirely complementary sets of nasty problems. :-)
And it’s the difficulties with subtree and submodule, in particular tying build systems to VCS implementations, that make me instinctively want to reach for some artifact dependency solution instead. I’m just on an emotional level afraid that a mono repo will become a ClearCase monstrosity once people start wanting to do things at the edges.
I’m not sure how easy it is to do this with [GUI tools like Tower…]
As a Tower user, I can confirm that Tower v2.5.1 doesn’t support filtering the commit log to commits that touch a specific directory, for example “tests/”. The closest Tower gets is filtering to commits that touch a specific file, for example “tests/core/Makefile”.
On the other hand, if people started going monorepo, there would be a use case for getting subdirectory filtering into GUI tools. :-)
I work on LLVM/Clang/Swift stuff right now and I can tell you that I would much prefer it if, at the very least, LLVM/Clang was a monorepo.
In principle, they aren’t supposed to be tightly coupled, but in practice, they are. They pretty much move in lockstep. Change codegen in Clang? Better be sure you have the LLVM lib that can handle it. Can’t get 128-bit data types handled properly in LLVM? Now you have to change Clang to split it into two 64-bit types (because writing significant amounts of LLVM IR is not how you want to spend your day). Ideally, compilation is a functional pipeline, but it never quite works out that way.
Swift uses the libs of both and needs to build with specific versions (with some custom patches). So our team needs to keep track of branches in all 3 (actually, 5+) repos to make sure we’ve got the correct versions of things. Sure, it’s doable, but it would be easier if we didn’t have to keep track of it all.
Lots of small repos is like microservices: it can work but it introduces friction at the boundaries that requires a lot of discipline and effort to do correctly. And the payoff is often not worth it.
Microservices are a great analogy. I guess I’m attracted to small repos and to microservices for the same reasons: The promise of stable APIs and code exchangeability.
Stable API’s are also frequently stagnant API’s which has it’s own set of problems. When it gets prohibitively expense to change things your codebase will ossify.
I’m torn. I think what you and others are saying makes a lot of sense, but I can’t help feeling that a little more bondage and discipline (which comes with multiple repos and dependence management) help people not be sloppy about APIs between components. You say “don’t do that” about making everything a nebulous megaproject, but I think the risk for that is huge.
You can have dependency management and partial builds with binary deliverables even between folders in a monolith repo, but I think the megarepo encourages megabuildsystem and megaproject tendencies. Maybe hardline managers and a system like Bazel can make it work better.
Anecdotally, every time I’ve seen it the bondage and discipline approach it has produced much worse results, and migrating from it to a monorepo has almost always improved matters in ways that would have been very hard to achieve while keeping things in separate repos.
I mean you certainly can have clean code with multiple repos. Many people do. But I don’t feel it actively encourages it. Both multiple and mono repos can have bad incentives (though I personally haven’t seen it get too bad with monorepos, I totally believe it can), but the difference is that the monorepo makes it easier rather than harder to fix the problems, by making refactoring in the direction of better modularity easier.
Anecdotally, every time I’ve seen it the bondage and discipline approach it has produced much worse results…
Seconded. Yes, it’s possible, but it turns out to be very difficult. It requires work that tools generally don’t make frictionless, so it doesn’t get done. (And I’m not convinced yet that making new tools to address will actually address it.)
A megabuildsystem sounds a lot like build consistency across projects to me. I’ve never seen a multi repo system where the build systems were consistent, always some oddball thing you have to do different for every repo.
Even worse, during cross team collaboration it becomes a total nightmare to synchronize everyone’s pet repos as you work together on a change. Two teams is manageable but when three, four, five teams have to collaborate across 10+ repos you might as well just scrub the feature and tell the customer it’s impossible.
I have never been in a multi repo environment and not been forced to ask each of those questions at least once a month.
Where I work we have a mixture of opensource and non-opensource projects, so a monorepo is not possible. For some things we use git submodules, but we have a lot of projects that are moving pretty independently so there’s no reason for a developer on project A to even need to pull down the code for projects B, C, or D.
A monorepo is still possible. For example, you could treat your open source projects as if they were third party dependencies in distinct repos. Then you vendor them in your “monorepo.”
(I guess you do still wind up with multiple repositories, but I see the open source repos as vestigal, so you still wind up with the benefits of a monorepo I think.)
I work in a consulting company and I would probably get behind this per-client, but not for EVERYTHING we do. I really wish the RSpec repos were set up like this, because there actually was tooling written to deal with cross-repo issues, which seems outside of the scope of the project - it helps to have, but a lot of it would not be necessary in a monorepo