There is no ideal CI/CD system. Systems evolve over time, and a solution that serves an enterprise of X * 1000 engineers would not work for a small company of the size of X*10 engineers.
When folks think about an “ideal” solution, it’s often simply because they hate changes that bring forth migration pain. But businesses do change, and solutions and engineering should adapt to business needs.
A good example would be: is Kubernetes an ideal runtime management system? With all the knobs and customization in the world, even the best experts would tell you “it depends”.
In that similar vein, I don’t think “serverless” is a good goal here. There are a lot of benefits when it comes to local disk caching and in-memory caching so throwing “serverless” into the mix is simply missing a big optimization opportunity for CI/CD systems.
Instead, a better goal to aim for would be improved caching. How do you cache large container images of X * 10GBs for CI purposes? How do you cache data sets or large machine learning models to incrementally retrain/rebuild/deploy them? How do you achieve ephemeral computing(serverless) while not sacrificing the performance gains of the local cache?
Uhu. In particular, if we scale down, the ideal amount of CI for me is “run a single program, if it returns 0 CI is passing, otherwise it fails”. That program can be a build system with all kinds of caching, or just a stupid script that sequentially does everything it needs to do.
Where this model falls down is when I need more than one computer, if I want to run CI on Mac, Windows, and Linux. Here, I am forced to go beyond a single CI script and make use of YAML of what not, to configure the menu of machines and schedule taks onto them.
No, but seriously, I am inspired by Bazel’s Remote Build Execution(RBE) model, which is pretty much what you just described here. In fact, I think the author of the post you linked work(ed) at Two Sigma, who is quite famous for their early Bazel adoption.
The Remote Build Execution API has since been adopted by a lot of other build tools: Goma(Chromium build tool), Pants, Please, Recc. Even Meta’s latest Buck2 is using it to achieve the goal of “first-class remote execution” build tool.
There seems to be a perceived distinction between CI/CD systems and build systems, while in my mind they are the same, with the difference that the former is typically run on a remote, distributed infrastructure and the latter locally. Things are converging these days, and to me the ideal is that there is no CI/CD system, but a build system that works locally and remotely. At $WORK we use Nix, make and containers to achieve that, and it’s pretty convenient to test locally with code, and then deploy to CircleCI to have the things run.
This is how I see it too, and I built https://garnix.io/ with that perspective. CI should be exactly the same as your local build. That way, CI is just a bigger machine you can use to get results faster. (That said, even nix in practice doesn’t get there - I think most people do not develop locally only with “nix build”, but instead use npm, cargo, cabal, etc directly. Until there’s decent module-level incremental builds in nix this will probably remain true.)
There is no ideal CI/CD system. Systems evolve over time, and a solution that serves an enterprise of X * 1000 engineers would not work for a small company of the size of X*10 engineers.
When folks think about an “ideal” solution, it’s often simply because they hate changes that bring forth migration pain. But businesses do change, and solutions and engineering should adapt to business needs.
A good example would be: is Kubernetes an ideal runtime management system? With all the knobs and customization in the world, even the best experts would tell you “it depends”.
In that similar vein, I don’t think “serverless” is a good goal here. There are a lot of benefits when it comes to local disk caching and in-memory caching so throwing “serverless” into the mix is simply missing a big optimization opportunity for CI/CD systems.
Instead, a better goal to aim for would be improved caching. How do you cache large container images of X * 10GBs for CI purposes? How do you cache data sets or large machine learning models to incrementally retrain/rebuild/deploy them? How do you achieve ephemeral computing(serverless) while not sacrificing the performance gains of the local cache?
With that said, I think the point about DAG is quite similar to https://gregoryszorc.com/blog/2021/04/07/modern-ci-is-too-complex-and-misdirected/. Which should be a good read under the same theme.
Uhu. In particular, if we scale down, the ideal amount of CI for me is “run a single program, if it returns 0 CI is passing, otherwise it fails”. That program can be a build system with all kinds of caching, or just a stupid script that sequentially does everything it needs to do.
Where this model falls down is when I need more than one computer, if I want to run CI on Mac, Windows, and Linux. Here, I am forced to go beyond a single CI script and make use of YAML of what not, to configure the menu of machines and schedule taks onto them.
I”d much rather stay within a single program, and just spawn threads on remote machines a-la http://catern.com/caternetes.html.
Have you heard of our lord and savior… Bazel?
No, but seriously, I am inspired by Bazel’s Remote Build Execution(RBE) model, which is pretty much what you just described here. In fact, I think the author of the post you linked work(ed) at Two Sigma, who is quite famous for their early Bazel adoption.
The Remote Build Execution API has since been adopted by a lot of other build tools: Goma(Chromium build tool), Pants, Please, Recc. Even Meta’s latest Buck2 is using it to achieve the goal of “first-class remote execution” build tool.
There seems to be a perceived distinction between CI/CD systems and build systems, while in my mind they are the same, with the difference that the former is typically run on a remote, distributed infrastructure and the latter locally. Things are converging these days, and to me the ideal is that there is no CI/CD system, but a build system that works locally and remotely. At $WORK we use Nix, make and containers to achieve that, and it’s pretty convenient to test locally with code, and then deploy to CircleCI to have the things run.
This is how I see it too, and I built https://garnix.io/ with that perspective. CI should be exactly the same as your local build. That way, CI is just a bigger machine you can use to get results faster. (That said, even nix in practice doesn’t get there - I think most people do not develop locally only with “nix build”, but instead use npm, cargo, cabal, etc directly. Until there’s decent module-level incremental builds in nix this will probably remain true.)