1. 12

  2. 32

    Each side of this debate classifies the other as zealous extremists (as only developers can!), but both of them miss the crux of the matter: Git and its accompanying ecosystem are not yet fit for the task of developing modern cloud-native applications.

    So, let me get this straight: Because your source control system doesn’t have innate knowledge of the linkages between your software components, that means it’s not up to the task of developing modern “cloud native” (God that term makes me want to cringe) applications?

    I think not. Git is an ugly duckling, its UX is horrible but the arguments the author makes are awfully weak.

    IMO expecting your VCS to manage dependencies is a recipe for disaster. Use a language that understands some kind of module and manage your dependencies there using the infrastructure that language provides.

    1. 11

      well said. I dislike Git too but for different reason - the source code is somewhat of a mess

      a hodgepodge of C, Python, Perl and Shell scripts

      Git is the perfect project for a rewrite in a modern language like Go, Rust, Nim or Julia. A single Git binary similar to Fossil would make adoption and deployment much better.

      1. 18

        I think at this point Git demonstrates that skipping on the single binary rhetoric doesn’t actually hamper adoption at all.

        1. 8

          Bitkeeper was a single binary - and it had a coherent command set. I miss it.

          1. 6

            It still exists and is licensed under Apache 2.0: https://www.bitkeeper.org/

            The only issue you have is no public host other than bkbits supporting bk.

            1. 1

              Also no support in most IDEs.

              I know many will point to the command line but having integrated blame/praise, diff, history etc is awesome.

              1. 2

                Honestly, the bigger beef I have with it is how little it comes with an installer and is difficult to package.

            2. 5

              I think fossil has that…

            3. 0

              I thought you are not supposed to use python because that would mean more dependencies… :P

            4. 3

              Arguments are indeed weak, as what is “cloud native”? However, I think he’s onto something – maybe the problem is not just Git, but everything around it as well? I mean, one could create a big giant monorepo in Git, but the rest of the tooling (CI especially) will still do the full checkout and won’t understand that there are different components. Monorepos make a lot of sense, however, it seems to me that we’re trying to use popular tools to tackle the problem they are not meant to solve (that is, Git being a full replacement for SVN/SVK/Perforce and handling monorepos).

              1. 3

                I don’t personally think monorepos make a lot of sense, and I think multi-repos are the way to go. If each separate piece is its own project and you let the language’s packaging / dependency management system handle the rest, I don’t see the problem.

                Examples I can think of where my point applies are Python, Ruby, Perl or Java. Unless maybe you’re using a language with no notion of packages and dependencies - C/C++ perhaps? I don’t see the issue.

                1. 8

                  The friction in coordinating branches and PRs across multiple repos has been an issue on every team I’ve worked on. Converting to a monorepo has been a massive improvement every time I’ve done it. Either you’ve used hugely different processes or you’ve never tried using a monorepo.

                  1. 1

                    The friction in coordinating branches and PRs across multiple repos

                    That’s a symptom that the project is not split across the correct boundaries. This is not different from the monolith-vs-services issue.

                    Amazon is a good example of splitting a complex architecture. Each team runs one or very few services each with their repos. Services have versioned APIs and PRs across teams are not needed.

                    1. 1

                      If you have a mature enough project such that every repo has a team and every team can stay in its own fiefdom then I imagine you don’t experience these issues as much.

                      But even so, the task of establishing and maintaining a coherent split between repos over the lifetime of a project is non-trivial in most cases. The multi-repo paradigm increases the friction of trying new arrangements and therefore any choices will tend to calcify, regardless of how good they are.

                      I’m speaking from the perspective of working on small to mid-sized teams, but large engineering organizations (like Amazon, although I don’t know about them specifically) are the ones who seem to gain the most benefit from monorepos. Uber’s recent SubmitQueue paper has a brief discussion of this with references.

                    2. 1

                      That’s interesting. Every team I’ve ever worked on had its architecture segmented into services such that cross branches and PRs weren’t an issue since each service was kept separate.

                    3. 6

                      The advantage of a monorepo is that a package can see all the packages depending on it. That means you can test with all users and even fix them in a single atomic commit.

                      The alternative in a large organisation is that you have release versions and you have to support/maintain older versions for quite some time because someone is still using them. Users have the integration effort whenever they update. In a monorepo this integration effort can be shifted to developer who changes the interface.

                      I don’t see how you could do continuous integration in a larger organization with multiple-repos. Continuous integration makes the company adapt faster (more agile with a lowercase a).

                      1. 4

                        Even if you use a language that has good (or some) package support, breaking a project into packages is not always easy. Do it too soon, and it will be at the wrong abstraction boundary and get in the way of refactoring, and to correct you’ll have to either loose historic, or deal with importing/exporting, which ain’t fun.

                        But if all your packages/components are in a single repo, you’ll still might get the boundaries wrong, but the source control won’t get much in the way of fixing it.

                      2. 1

                        100% on the surrounding tooling. CI tooling being based around Git means that a lot of it is super inflexible. We’ve ended up splitting repos just to get CI to do what we need it to do, and adding friction in surrounding processes.

                        A rethink of the ecosystem would be very interesting

                        1. -1

                          Came here to CTRL+F perforce, was not disappointed

                      3. 12

                        I can’t stop but laugh at almost every sentence in this article. It has a lot of assumptions that company’s have a lot of interdepending and fast moving repositories where this could be a real issue mixed with buzzwords and strange statements such as: “It’s time for a new generation of source control that wasn’t purely designed for open-source projects, C and the Linux”

                        This blog sounds like this company likes to invent new problems for developers/companies and wanting to engineer like “google”. https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb

                        1. 7

                          I agree with the premise of the post that Git doesn’t do a good job supporting monorepos. Assuming the scaling problem of large repositories will go away with time, there is still the issue of how clients should interact with a monorepo. e.g. clients often don’t need every file at a particular commit or want the full history of the repo or the files being accessed. The feature support and UI for facilitating partial repo access is still horribly lacking.

                          Git has the concept of a “sparse checkout” where only a subset of files in a commit are manifested in the working directory. This is a powerful feature for monorepos, as it allows clients to only interact with files relevant to the given operation. Unfortunately, the UI for sparse checkouts in Git is horrible: it requires writing out file patterns to the .git/info/sparse-checkout file and running a sequence of commands in just the right order for it to work. Practically nobody knows how to do this off the top of their head and anyone using sparse checkouts probably has the process abstracted away via a script. In contrast, I will point out that Mercurial allows you to store a file in the repository containing the patterns that constitute the “sparse profile” and when you do a clone or update, you can specify the path to the file containing the “sparse profile” and Mercurial takes care of fetching the file with sparse file patterns and expanding it to rules to populate the repository history and working directory. This is vastly more user intuitive than what Git provides for managing sparse checkouts. Not perfect, but much, much better. I encourage Git to steal this feature.

                          Another monorepo feature that is yet unexplored in both Git and Mercurial is partial repository branches and tags. Branches and tags are global to the entire repository. But for monorepos comprised of multiple projects, global branches and tags may not be appropriate. People may want branches and tags that only apply to a subset of the repo. If nothing else this can cut down on “symbol pollution.” This isn’t a radical idea, as per-project branches and tags are supported by version control systems like Subversion and CVS.

                          1. 5

                            I agree with you, git famously was not designed for monorepo.

                            Also agreed, sub-tree checkouts and sub-tree history would be essential for monorepos. Nobody wants to see every file from every obscure project in their repo clones, it would eat up your attention.

                            I would also like storing giant asset files in repo ( without the git-lfs hack ), more consistent commands, some sort of API where compilers and build systems can integrate into revision control etc. Right now, it seems we have more and more tooling on top of Git to make it work in all these conditions while git was designed to manage a single text file based repo, namely the Linux kernel.

                          2. 4

                            A build system. You’ve invented a build system.

                            Or possibly a package manager.

                            1. 4

                              None of the distributed VCSs (git, hg, etc) is a really good fit for Monorepos. Git begins to stumble once the repo is a few gigabytes. You start using sparse checkout, shallow clone, LFS and it makes the whole process complicated.

                              Subversion is the only open source VCS where a terabyte repository might be possible. Does anybody know how large the Apache repo is?

                              I believe thanks to the monorepo hype there is now an opportunity for a Subversion revival or replacement.

                              1. 3

                                And since when having multiple repos implies using git submodules to handle them? In my experience, proper packaging and semantic versioning is what makes it easy to work with multiple repositories.

                                Of course that comes with additional bureaucracy, but it also fosters better separation of software components.

                                Sure, the mono-repository approach allows for a fast “single source of truth” lookup, but it comes with a high price, as soon as people will realize that they can also cut corners. Eventually it gets a pile of spaghetti.

                                (For the record, just in case you could not tell, I’ve got a strong bias towards the multi-repo, due to everyday mono-repository frustration.)

                                1. 3

                                  The flip side is with multi-repo you will amplify the Conway’s law mechanism where people tend to introduce new functionality in the lowest friction way possible. If it would be easier to do it all in one project that’s what will happen, even if it would be more appropriate to split the additions across multiple projects.

                                  Introducing friction into your process won’t magically improve the skills and judgement of your team.

                                  1. 1

                                    I once proposed an alternative to git-subtree that splits commits between projects at commit-time: http://www.mos6581.org/git_subtree_alternative. This should help handling of tightly-coupled repositoties, but requires client changes.

                                    1. 1

                                      Why not just use a monorepo and make no client changes?

                                      1. 1

                                        Because you want to share libraries with other projects.

                                    2. 1

                                      Yes, there’s wisdom in what you say.

                                  2. 3

                                    tl;dr; No, we don’t need another one.

                                    Source code management is a hard task. Don’t expect to have a tool that fixes eveything without you doing absolutely nothing. And if you do so, I hope you enjoy unexpected behaviour.

                                    One that embraces code dependencies and helps the engineering team define and manage them, rather than scaring them away.

                                    Dependency management: a) Depends on the environment, technology, etc. b) Isn’t the source version control’s task.

                                    There are certain components you will need to build and deploy hundreds of times a day. At the same time, there are other more delicate and mission-critical components. These require human supervision and extra precaution. The problem with mono-repository is that it mixes all of these components into one. More surprising is the fact that today’s vast Git CI ecosystem, with its impressive offerings in both the hosted and the SaaS space, doesn’t even try to tackle the issue.

                                    Even Azure DevOps have a mechanism to trigger specific CI processes when a certain path within the repository is modified.

                                    And don’t even say “Cloud Native” again.