1. 30
  1. 9

    One of the speakers antipatterns was the Monorepo.

    I think monorepos provide a lot of benefits. However, none of the open source infrastructure available works well with them especially GitHub. In fact, if anyone wants to compete with GitHub providing tools for working monorepos might be one way to differentiate.

    1. 2

      We’re working on it! https://vfsforgit.com/

      1. 2

        Yeah, both git and hg have been doing a lot of great work to make monorepos workable. Like I mentioned elsewhere the missing piece isn’t really the VCS. It’s all the rest of the tooling around it.

      2. 1

        What is the issue with GitHub monorepos? We have our entire company in one repo at work and I’m not seeing many pain points (ok, we recently made marketing go and use a CDN instead of checking in hundreds of megabytes of images, but nothing other than that).

        1. 2

          Interesting! Could you share how many languages you have in your codebase, how many different projects? What do you use for your build pipeline? Asking as I had pretty negative experiences with GitHub, large monorepo featuring 30ish different, independent projects, being tests via Jenkins – we were constantly running over GitHub API quota, and that was only the tip of the iceberg. You mentioned another pain point – storing large artifacts – we all wanted to do vendoring, but with git it was not easy, especially if there were binary artifacts involved.

          1. 1

            Don’t have real experience with a monorepo but some tooling already falls flat when you want it to build “more than 1 project per repo”, one really broken one didn’t even work when trying to use a subdirectory at all. Sorry for lack of specifics, this was at my last job in 2017–. I just don’t think this will be universally supported all the time by everything.

            So one point definitely is that the nice integration into github with webhooks and so on. If you rely on a third party tool, they must support this very basic step already.

            1. 1

              Github issues don’t scale real well with a github monorepo. Most of the github ci/cd systems also don’t scale real well with a mono repo.

              It’s not github specific but many people running a monorepo on github also aren’t really using a build tool that works well with mono-repos. You really wanting a tool like bazel, buck, or pants with a build cache to get the best result.

            2. 1

              There is plenty of competitors.

              1. 3

                I’m not aware of any that really support the monorepo model though.

                1. 1

                  If you are taking about a VCS capable of handling huge monorepos I can mention https://www.plasticscm.com/ for one.

                  1. 4

                    I feel you need to integrate with a build system to support monorepo’s better. An inherent part of building a monorepo efficiently is understanding what parts need to rebuild as an effect of a change. Google and facebook (bazel is one i think) have build systems capable of this, but i have never tried any yet.

                    1. 3

                      AFAIK Bazel is just part of what the complete build / source control system is, which I believe is called Piper. But I can tell you that Plastic has enough functionality to hold repos as big as Google’s and, as you say, handle complex build processes with modules and so forth.

                      1. 2

                        The VCS isn’t the missing piece here. I can have a Monorepo in Hg, Git, and a few other tools if I want to. The missing piece is all of the other tooling that needs to work with a Monorepo.

                    2. 2

                      Actually no, I’m talking about Issue trackers, Build systems, Continuous Integration products, all of the other tooling that a Monorepo needs.

              2. 11

                I work on a Kubernetes-adjacent project. We consume K8S libraries, but we aren’t building something that is K8S-native (for those that care, we are using the K8S API server).

                The monorepo is easily the worst part of K8S for me. The Go code that does a great deal of work erasing types (hi unstructured.Unstructured) is second. The Java-looking Go code is third.

                The problem with the monorepo is that it makes dependency management for them really easy, but for consumers of the library incredibly hard. The sheer number of dependencies that the monorepo has means you spend a great deal of time in dependency hell (I did write “if you’re not careful” first, but that isn’t true, there’s nothing you can really do). The monorepo encourages code reuse, so the repo is strongly bound to itself. It’s very hard to tease out the threads of functionality with the packages you actually need without bringing in many others that then themselves bring in many others and so on and so on.

                I am in the process of converting a code base to remove dependencies on the monorepo and depend on the new smaller ones. This makes life a great deal easier, but not all the functionality has been pulled out into the smaller ones, so that’s been a bit frustrating. It’ll get there. The only problem now is that each library is versioned together (e.g. tagged “kubernetes-1.12”), and you should really ensure that you are using the same version of each library, so you have to tell dep explicitly for each K8S library you use to use the right one. That’s a pain. I’m not sure how to get around it, but it sure is ugly.

                I have very mixed feelings on K8S. I think it is a strong technical achievement which has enabled cloud providers to build some very useful tools for the enterprises that need it. I don’t think it was ever really intended to become this cornerstone of the cloud such that things like the K8S YAML is now some sort of lingua franca, and I think its popularity is driving more people into their Go libraries rather than consuming K8S as a tool. You don’t ever get the feeling that the K8S monorepo is trying to provide an API, it’s just trying to work. The smaller libraries are much easier to work with as an API.

                I am a big fan of Go and it’s all I write professionally, but I don’t think K8S should have been written in it, for fault not really of their own. I think they started with Go too early in the Go lifecycle, before maintainable and clean Go was really understood, and they put a lot of what we’d now consider non-idiomatic code in there (huge overuse of interfaces being the worst, which makes navigating the code very hard). I think they would have had a better outcome if they’d stuck with Java, but hindsight is 20:20. Hopefully the refactors help.

                1. 3

                  I am a big fan of Go and it’s all I write professionally, but I don’t think K8S should have been written in it, for fault not really of their own. I think they started with Go too early in the Go lifecycle, before maintainable and clean Go was really understood, and they put a lot of what we’d now consider non-idiomatic code in there.

                  Idiomatic Go was reasonably well understood when Kubernetes started. Several members of the core Go team even offered, if I recall correctly, to mentor the initial Kubernetes contributors, review PRs, etc. The problem was that the Kubernetes folks simply weren’t interested, and on a few occasions made that known to the Go team quite aggressively. They were primarily Java programmers beforehand, and they simply wrote Java-flavored Go, and all the Go resources in the world weren’t going to change that.

                  1. 3

                    The problem with the monorepo is that it makes dependency management for them really easy, but for consumers of the library incredibly hard.

                    It’s Google. They don’t give a single shit about anyone outside of Google who wants to use their open source code. I’ve been using their C++ WebRTC library a lot, and it’s extremely clear that they built it for Google Chrome, not as a library for other people to use.

                    They don’t care whether it’s standards-compliant C++ or if it breaks with other compilers than their version of Clang. I encountered a case where they deprecated a feature because it will be replaced by another feature in the future. I literally was recommended automatically rewriting all of their headers’ includes because they’re so unfriendly towards being used as a library for anyone other than Google:

                    I think webrtc is a bit unfriendly to the standard library install conventions. One option you might want to explore is to rewrite all #include “…” lines in the headers as you install them in /usr/include/webrtc/, so that all internal includes use relative names.

                    1. 1

                      This was my team’s experience trying to adopt gRPC. Huge, glaring issues that we couldn’t get support on, even when we had team members visiting Google campuses and talking about them! Needed features that we were repeatedly informed existed in the Google-internal gRPC implementation that no-one (ever) got around to putting in the open-source impl. After a lonnnng time we eventually realised Twirp would make everyone’s life easier. And it has.

                      If we couldn’t make it happen with that level of access, most people really just have to take it or leave it.

                    2. 1

                      I watched this talk last night and I found the Go reflection / Java stuff fascinating! I had no idea.

                      (I’ve never used Kubernetes, but for many years I used Google’s internal cluster system Borg, which is written in C++, and which Kubernetes is loosely based on. The C++ doesn’t have any of this kind of reflection (other than protobufs). I also tried to write a Borg-like cluster manager starting in 2010-2011, concurrent with Kubernetes!)

                      While I was watching it, I couldn’t shake the feeling that this is an advertisement for dynamic languages, or Greenspun’s tenth rule:

                      Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

                      It sounds like they are doing metaprogramming on objects so that they can be serialized to YAML (to store in etcd), and versioned. So they invented a little language within Go.

                      Now I get that Kubenetes is hundreds of thousands of lines of code, and a static type system is hugely beneficial there (even if it would be 1/5th the line count in a dynamic language).

                      But now I wonder what would happen if you went the opposite way and used a dynamic language with some optional type checking. There have been a bunch of developments in that area since Kubernetes was released in 2014.

                      Anyway, there are just idle thoughts. I’m sure the real details are a lot more complicated. But it did not occur to me that Go was a pretty bad fit for Kubernetes. To be fair I don’t think C++ or Python are great choices, having had some experience with cluster managers in both.

                      My guess is that the killer feature of Go is concurrency, and that basically trumps all the disadvantages. As far as I understand, Java does have better concurrency than both C++ and Python, so it indeed might have been a good choice.