I subscribe to the belief that every project should have a Makefile with tasks:
deps installs all dependencies as automated as they can be (it’s OK if it outputs some manual steps, but a rerun should detect those things having been done and not output the manual nudges if they’re not needed)
check runs all linting and static analysis without modifying anything
test runs all unit tests minimally and any integration tests that are low impact
build produces some artifact
all does all of the above
clean deletes artifacts and, if possible, restores the codebase to as close to its original state as possible
help outputs a list of tasks
For my Scala and Rust projects, this yields a ~14 line Makefile that just executes sbt or cargo, respectively. For my Python and Ruby projects, there’s a lot more to it. Any sufficiently advanced system of build scripts or documentation eventually just reimplements make.
All of this in pursuit of the idea that someone should be able to clone and verify a build representing a clean starting point for development and troubleshooting with two three commands: git clone whatever && cd whatever && make all.
deps installs all dependencies as automated as they can be
…
all does all of the above
This is… weird. If I run git clone whatever && cd whatever && make all, and it suddenly starts running apt commands with root, I’d be seriously pissed.
“Install dependencies” probably refers to installing the dependencies in user space, for example “pip install –user”. In any case, nothing should be able to run root commands via sudo without your prior consent. If this is possible, e.g., because you “misconfigured” sudo, then you have other problems. ;)
I always assumed that I was alone doing so and now I am happy to discover that other people also find merit in a (small) Makefile with some standard targets like ‘all’, ‘build’, ‘test’, and ‘check’, regardless if the actual build System is something complete different (like, for example, Gradle, sbt, or Mill).
The main value is a consistent onboarding across all projects without someone needing to know the specifics of a particular ecosystem and without having to read a long README (if it even exists).
At my company, we have more than 5,000 people writing code. Many are working in Java or Kotlin and JavaScript while another set is working in Go. Folks in my area, data science, are working mostly in Python but a few teams are slinging Scala. Without the convenience of a Makefile on-ramp, I can get an env up and running for our Java, Kotlin, and Scala projects without thinking because of the time I’ve spent in those ecosystems (Gradle and SBT, really). I touch JavaScript and Go so rarely that I have to look up stuff on how to do the things I want to do because there’s a false assumption of familiarity in those codebases. I’ve spent a lot of time improving my team’s Python devex but I have a long way to go to be able to play in some other teams’ sandboxes. It’s a lot easier now that I’m more familiar with Python tooling but I’d happily take a make check test over the ecosystem equivalent in any ecosystem so that I as a developer can care less about remembering the tooling and focus more on delivering value by expeditiously working in that codebase with guardrails, lints, tests, and everything else the maintainers care about running without me having to care about them (perhaps except when they fail, hopefully with actionable errors!).
BSD make is great for small projects which don’t have a lot of files and do not have any compile time option. For larger projects in which you want to enable/disable options at compilation time, you might have to use a more complete build system.
Here’s the problem: Every large project was once a small project. The FreeBSD build system, which is built on top of bmake, is an absolute nightmare to use. It is slow, impossible to modify, and when it breaks it’s completely incomprehensible trying to find out why.
For small projects, a CMake build system is typically 4-5 lines of CMake, so bmake isn’t really a win here, but CMake can grow a lot bigger before it becomes an unmaintainable mess and it’s improving all of the time. Oh, and it can also generate the compile_commands.json that your LSP implementation (clangd or whatever) uses to do syntax highlighting. I have never managed to make this work with bmake (@MaskRay published a script to do it but it never worked for me).
The problem is that cmake is actually literal hell to use. I would much rather use even the shittiest makefile than cmake.
Some of the “modern” cmake stuff is slightly less horrible. Maybe if the cmake community had moved on to using targets, things would’ve been a little better. But most of the time, you’re still stuck with ${FOO_INCLUDE_DIRS} and ${FOO_LIBRARIES}. And the absolutely terrible syntax and stringly typed nature won’t ever change.
Give me literally any build system – including an ad-hoc shell script – over cmake.
Agreed. Personally, I also detest meson/ninja in the same way. The only thing that I can tolerate writing AND using are BSD makefiles, POSIX makefiles, and plan9’s mkfiles
Using it is less the problem - creating shared libraries is much harder. Every linker is weird and special, even with ccld. As someone dealing with AIX in a dayjob…
The problem is that cmake is actually literal hell to use. I would much rather use even the shittiest makefile than cmake.
Yes. The last time I seriously used cmake for cross compiles (trying to build third-party non-android code to integrate into an Android app) I ended up knee deep in strace to figure out which of the hundreds of thousands of lines of cmake scripts were being included from the system cmake directory, and then using gdb on a debug build of cmake to try to figure out where it was constructing the incorrect strings, because I had given up on actually being able to understand the cmake scripts themselves, and why they were double concatenating the path prefix.
Using make for the cross compile was merely quite unpleasant.
Can we improve on make? Absolutely. But cmake is not that improvement.
What were you trying to build? I have cross-compiled hundreds of CMake things and I don’t think I’ve ever needed to do anything other than give it a cross-compile toolchain file on the command line. Oh, and that was cross-compiling for an experimental CPU, so no off-the-shelf support from anything, yet CMake required me to write a 10-line text file and pass it on the command line.
This was in 2019-ish, so I don’t remember which of the ported packages it was. It may have been some differential equation packages, opencv, or some other packages. There was some odd interaction between their cmake files and the android toolchain’s cmake helpers that lead to duplicated build directory prefixes like:
which was nearly impossible to debug. The fix was easy once I found the mis-expanded variable, but tracking it down was insanely painful.
The happy path with cmake isn’t great but the sad path is bad enough that I’m not touching it in any new software I write.
The happy path with cmake isn’t great but the sad path is bad enough that I’m not touching it in any new software I write.
The sad path with bmake is far sadder. I spent half a day trying to convince a bmake-based build system to compile the output from yacc as C++ instead of C before giving up. There was some magic somewhere but I have no idea where and a non-trivial bmake build system spans dozens of include files with syntax that looks like line noise. I’ll take add_target_option over ${M:asdfasdfgkjnerihna} any day.
Cmake ships with just over 112,000 lines of modules, and it seems any non trivial project gets between hundreds and thousands of lines of additional cmake customizations and copy-pasted modules on top of that. And if anything goes wrong in there, you need to get in and debug that code. In my experience, it often does.
With make, its usually easier to debug because there just isn’t as much crap pulled in. And even when there is, I can hack around it with a specific, ad-hoc target. With cmake, if something goes wrong deep inside it, I expect to spend a week getting it to work. And because I only touch cmake if I have to, I usually don’t have the choice of giving up – I just have to deal with it.
I’m very happy that these last couple years, I spend much of my paid time writing Go, and not dealing with other people’s broken build systems.
Cmake ships with just over 112,000 lines of modules, and it seems any non trivial project gets between hundreds and thousands of lines of additional cmake customizations and copy-pasted modules on top of that.
The core bmake files are over 10KLoC, which doesn’t include the built-in rules, and do far less than the CMake standard library (which includes cross compilation, finding dependencies using various tools, and so on). They are not namespaced, because bmake does not have any notion of scopes for variables, and so any one of them may define some variable that another consumes and
With make, its usually easier to debug because there just isn’t as much crap pulled in.
That is not my experience with any large project that I’ve worked on with a bmake or GNU make build system. They build some half-arsed analogue of a load of the CMake modules and, because there’s no notion of variable scope in these systems, everything depends on some variable that is set somewhere in a file that’s included at three levels of indirection by the thing that includes the Makefile for the component that you’re currently looking at. Everything is spooky action at a distance. You can’t find the thing that’s setting the variable, because it’s constructing the variable name by applying some complex pattern to the string. When I do find it, instead of functions with human-readable names, I discover that it’s a line like _LDADD_FROM_DPADD= ${DPADD:R:T:C;^lib(.*)$;-l\1;g} (actual line from a bmake project, far from the worst I’ve seen, just the first one that jumped out opening a random .mk file), which is far less readable than anything I’ve ever read in any non-Perl language.
In contrast, modern CMake has properties on targets and the core modules are work with this kind of abstraction. There are a few places where some global variables still apply, but these are easy to find with grep. Everything else is scoped. If a target is doing something wrong, then I need to look at how that target is constructed. It may be as a result of some included modules, but finding they relevant part is usually easy.
The largest project that I’ve worked on with a CMake build system is LLVM, which has about 7KLoC of custom CMake modules. It’s not wonderful, but it’s far easier to modify the build system than I’ve found for make-based projects a tenth the size. The total time that I’ve wasted on CMake hacking for it over the last 15 years is less than a day. The time I’ve wasted failing to get Make-based (GNU Make or bmake) projects to do what I want is weeks over the same period.
Modern CMake is a lot better and it’s being aggressively pushed because things like vcpkg require modern CMake, or require you to wrap your crufty CMake in something with proper exported targets. Importing external dependencies.
I’ve worked on projects with large CMake infrastructure, large GNU make infrastructure, and large bmake infrastructure. I have endured vastly less suffering as a result of the CMake infrastructure than the other two. I have spent entire days trying to change things in make-based build systems and given up, whereas CMake I’ve just complained about how ugly the macro language is.
I like the idea of build2. I was hoping for a long time that Jon Anderson would finish Fabrique, which had some very nice properties (merging of objects for inheriting flags, a file type in the language that was distinct from a string and could be mapped to a path or a file descriptor on invocation).
exe{my-prog}: c{src1} cxx{src2}
Perhaps it’s just me, but I really don’t find that to be great syntax. Software in general (totally plausible rule of thumb that I was told and believe) is read around 10 times more than it is written. For build systems, that’s probably closer to 100, so terse syntax scares me.
The problem I have now is ecosystem lock in. 90% of the things that I want to depend on provides a CMake exported project. I can use vcpkg to grab thousands of libraries to statically link against and everything just works. From this example:
With the libzstd dependency:
import libs = libzstd%lib{zstd}
How does it find zstd? Does it rely on an export target that zstd exposed, a built-in package, or some other mechanism?
CMake isn’t what I want, but I can see a fairly clear path to evolving it to be what I want. I don’t see that path for replacing it with something new and for the new thing to be worth replacing CMake it would need to be an order of magnitude better for my projects and able to consume CMake exported targets from other projects (not pkg-config, which can’t even provide flags for compiler invocations for Objective-C, let alone handle any of the difficult configuration cases). If it can consume CMake exported targets, then my incentive for libraries is to use CMake because then I can export a target that both it and CMake can consume.
Perhaps it’s just me, but I really don’t find that to be great syntax. Software in general (totally plausible rule of thumb that I was told and believe) is read around 10 times more than it is written. For build systems, that’s probably closer to 100, so terse syntax scares me.
No, it’s not just you, this is a fairly common complaint from people who first see it but interestingly not from people who used build2 for some time (we ran a survey). I believe the terse syntax is beneficial for common constructs (and what I’ve shown is definitely one of the most common) because it doesn’t get in the way when trying to understand more complex buildfiles. At least this has been my experience.
How does it find zstd? Does it rely on an export target that zstd exposed, a built-in package, or some other mechanism?
That depends on whether you are using just the build system or the build system and the package manager stack. If just the build system, then you can either specify the development build to import explicitly (e.g., config.import.libzstd=/tmp/libzstd), bundle it with your project (in which it gets found automatically) or, failed all of the above, build2 will try to find the installed version (and extract additional options/libraries from pkg-config files, if any).
If you are using the package manager, then by default it will download and build libzstd from the package (but you can also instruct the package manager to use the system-installed version if you prefer). We happen to have the libzstd package sitting in the submission queue: https://queue.cppget.org/libzstd
But that’s a pretty vanilla case that most tools can handle these days. The more interesting one is lex/yacc from the buidfile I linked. It uses the same import mechanism to find the tools:
And we have them packaged: https://cppget.org/reflex and https://cppget.org/byacc. And the package manager will download and build them for you. And it’s smart enough to know to do it in a seperate host configuration so that they can still be executed during the build even if you are cross-compiling. This works auto-magiclaly, even on Windows. (Another handy tool that can be used like that is xxd: https://cppget.org/xxd).
CMake isn’t what I want, but I can see a fairly clear path to evolving it to be what I want. I don’t see that path for replacing it with something new and for the new thing to be worth replacing CMake it would need to be an order of magnitude better for my projects.
I am clearly biased but I think it’s actually not that difficult to be an order of magnitude better than CMake, it’s just really difficult to see if all you’ve experienced is CMake (and maybe some make-based projects).
Firstly, CMake is a meta build system which closes the door on quite a few things (for an example, check how CMake plans to support C++20 modules; in short it’s a “let’s pre-scan the world” approach). Then, on one side of this meta build system sandwich you have a really primitive build model with the famous CMake macro language. On the other you have the lowest common denominator problem of the underlying build systems. Even arguably the best of them (ninja) is quite a basic tool. The result is that every new functionality, say support for a new source code generator, has to be implemented in this dreaded macro language with an eye on the underlying build tools. In build2, in contrast, you can implement you own build system module in C++ and the toolchain will fetch, build, and load it for you automatically (pretty much the same as the lex/yacc tools above). Here is a demo I’ve made of a fairly elaborate source code generator setup for a user (reportedly it took a lot of hacking around to support in CMake and was the motivation for them to switch to build2): https://github.com/build2/build2-dynamic-target-group-demo/
No, it’s not just you, this is a fairly common complaint from people who first see it but interestingly not from people who used build2 for some time (we ran a survey)
That’s a great distinction to make. Terse syntax is fine for operations that I will read every time I look in the file, but it’s awful for things that I’ll see once every few months. I don’t know enough about build2 to comment on where it falls on this spectrum.
For me, the litmus test of a build systems is one that is very hard to apply to new ones: If I want to modify a build system for a large project that has aggregated for 10-20 years, how easy is it for me to understand their custom parts? CMake is not wonderful here, but generally the functions and macros are easy to find and to read once I’ve found them. bmake is awful because its line-noise syntax is impossible to search for (how do you find what the M modifier in an expression does in the documentation? “M” as a search string gives a lot of false positives!).
That depends on whether you are using just the build system or the build system and the package manager stack. If just the build system, then you can either specify the development build to import explicitly (e.g., config.import.libzstd=/tmp/libzstd), bundle it with your project (in which it gets found automatically) or, failed all of the above, build2 will try to find the installed version (and extract additional options/libraries from pkg-config files, if any).
My experience with pkg-config is not very positive. It just about works for trivial options but is not sufficiently expressive for even simple things like different flags for debug and release builds, let alone anything with custom configuration options.
If you are using the package manager, then by default it will download and build libzstd from the package (but you can also instruct the package manager to use the system-installed version if you prefer). We happen to have the libzstd package sitting in the submission queue: https://queue.cppget.org/libzstd
That looks a lot more promising, especially being able to use the system-installed version. Do you provide some ontology that allows systems to map build2 package names to installed packages so that someone packaging a project that I build with build2 without having to do this translation for everything that they package?
And we have them packaged: https://cppget.org/reflex and https://cppget.org/byacc. And the package manager will download and build them for you. And it’s smart enough to know to do it in a seperate host configuration so that they can still be executed during the build even if you are cross-compiling. This works auto-magiclaly, even on Windows. (Another handy tool that can be used like that is xxd: https://cppget.org/xxd).
This is a very nice property, though one that I already get from vcpkg + CMake.
Firstly, CMake is a meta build system which closes the door on quite a few things (for an example, check how CMake plans to support C++20 modules; in short it’s a “let’s pre-scan the world” approach). Then, on one side of this meta build system sandwich you have a really primitive build model with the famous CMake macro language.
The language is pretty awful, but the underlying object model doesn’t seem so bad and is probably something that could be exposed to another language with some refactoring (that’s probably the first thing that I’d want to do if I seriously spent time trying to improve CMake).
In build2, in contrast, you can implement you own build system module in C++ and the toolchain will fetch, build, and load it for you automatically (pretty much the same as the lex/yacc tools above). Here is a demo I’ve made of a fairly elaborate source code generator setup for a user (reportedly it took a lot of hacking around to support in CMake and was the motivation for them to switch to build2):
That’s very interesting and might be a good reason to switch for a project that I’m currently working on.
I have struggled in the past with generated header files with CMake, because the tools can build the dependency edges during the build, but I need a coarse-grained rule for the initial build that says ‘do the step that generates these headers before trying to build this target’ and there isn’t a great way of expressing that this is a fudge and so I can break that arc for incremental builds. Does build2 have a nice model for this kind of thing?
If I want to modify a build system for a large project that has aggregated for 10-20 years, how easy is it for me to understand their custom parts?
In build2, there are two ways to do custom things: you can write ad hoc pattern rules in a shell-like language (similar to make pattern rules, but portable and higher-level) and everything else (more elaborate rules, functions, configuration, etc) is written in C++(14). Granted C++ can be made an inscrutable mess, but at least it’s a known quantity and we try hard to keep things sane (you can get a taste of what that looks like from the build2-dynamic-target-group-demo/libbuild2-compiler module I linked to earlier).
My experience with pkg-config is not very positive. It just about works for trivial options but is not sufficiently expressive for even simple things like different flags for debug and release builds, let alone anything with custom configuration options.
pkg-config has its issues, I agree, plus most build systems don’t (or can’t) use it correctly. For example, you wouldn’t try to cram both debug and release builds into a single library binary (e.g., .a or .so; well, unless you are Apple, perhaps) so why try to cram both debug and release (or static/shared for that matter) options into the same .pc file?
Plus, besides the built-in values (Cflags, etc), pkg-config allows for free-form variables. So you can extend the format how you see fit. For example, in build2 we use the bin.whole variable to signal that the library should be linked in the “whole archive” mode (which we then translate into the appropriate linker options). Similarly, we’ve used pkg-config variable to convey C++20 modules information and it also panned out quite well. And we now convey custom C/C++ library metadata this way.
So the question is do we subsume all the existing/simple cases and continue with pkg-config by extending its format for more advanced cases or do we invent a completely new format (which is what WG21’s SG15 is currently trying to do)?
Do you provide some ontology that allows systems to map build2 package names to installed packages so that someone packaging a project that I build with build2 without having to do this translation for everything that they package?
Not yet, but we had ideas along these lines though in a different direction: we were thinking of each build2 package also providing a mapping to the system package names for the commonly used distributions (e.g., libzstd-dev for Debain/Ubuntu, libzstd-devel for Fedora/etc) so that the build2 package manager can query the installed package’s version (e.g., to make sure the version constraints are satisfied) or to invoke the system package manager to install the system package. If we had such a mapping, it would also allow us to also achieve what you are describing.
This is a very nice property, though one that I already get from vcpkg + CMake.
Interesting. So you could ask vcpkg to build you a library without even knowing it has build-time dependencies on some tools and vcpkg will automatically create a suitable host configuration, build those tools there, and pass them to the library’s so that it can execute them during its build?
If so, that’s quite impressive. For us, the “create a suitable host configuration” part turned into a particularly deep rabbit hold. What is “suitable”? In our case we’ve decided to use the same compiler/options as what was used to build build2. But what if the PATH environment variable has changed and now clang++ resolves to something else? So we had to invent a notion of hermetic build configurations where we save all the environment variables that affect every tool involved in the build (like CPATH and friends). One nice off-shot of this work is that now in non-hermetic build configurations (which are the default), we detect changes to the environment variables besides everything else (sources, options, compiler versions, etc).
I have struggled in the past with generated header files with CMake, because the tools can build the dependency edges during the build, but I need a coarse-grained rule for the initial build that says ‘do the step that generates these headers before trying to build this target’ and there isn’t a great way of expressing that this is a fudge and so I can break that arc for incremental builds. Does build2 have a nice model for this kind of thing?
Yes, in build2 you normally don’t need any fudging, the C/C++ compile rules are prepared to deal with generated headers (via -MG or similar). There are use-cases where it’s impossible to handle the generated headers fully dynamically (for example, because the compiler may pick up a wrong/outdated header from another search path) but this is also taken care of. See this article for the gory details: https://github.com/build2/HOWTO/blob/master/entries/handle-auto-generated-headers.md
That’s very interesting and might be a good reason to switch for a project that I’m currently working on.
As I mentioned earlier, I would be happy to do some hand-holding if you want to give it a try. Also, build2 is not exactly simple and has a very different mental model compared to CMake. In particular, CMake is a “mono-repo first” build system while build2 is decidedly “multi-repo first”. As a result, some things that are often taken as gospel by CMake users (like the output being a subdirectory of the source directory) is blasphemy in build2. So there might be some culture shock.
BTW, in your earlier post you’ve mentioned Fabrique by Jon Anderson but I can’t seem to find any traces of it. Do you have any links?
Granted C++ can be made an inscrutable mess, but at least it’s a known quantity and we try hard to keep things sane (you can get a taste of what that looks like from the build2-dynamic-target-group-demo/libbuild2-compiler module I linked to earlier).
This makes me a bit nervous because it seems very easy for non-portable things to creep in with this. To give a concrete example, if my build environment is a cloud service then I may not have a local filesystem and anything using the standard library for file I/O will be annoying to port. Similarly, if I want to use something like Capsicum to sandbox my build then I need to ensure that descriptors for files read by these modules are provided externally.
It looks as if the abstractions there are fairly clean, but I wonder if there’s any way of linting this. It would be quite nice if this could use WASI as the host interface (even if compiling to native code) so that you had something that at least can be made to run anywhere.
pkg-config has its issues, I agree,
My bias against pkg-config originates from trying to use it with Objective-C. I gave up trying to add an --objc-flags and –objcxx-flags` option because the structure of the code made this kind of extension too hard. Objective-C is built with the same compiler as C/C++ and takes mostly the same options, yet it wasn’t possible to support. This made me very nervous that the system could adapt to any changes in requirements from C/C++ and no chance of providing information for any other language. This was about 15 years ago, so it may have improved since thne.
Not yet, but we had ideas along these lines though in a different direction: we were thinking of each build2 package also providing a mapping to the system package names for the commonly used distributions
That feels back to front because you’re traversing the graph in the opposite direction to the edge that must exist. Someone packaging libFoo for their distribution must know where libFoo comes from and so is in a position to maintain this mapping (we could fairly trivially automate it from the FreeBSD ports system for any package that we build from a cppget source, for example). In contrast, the author of a package doesn’t always know where things come from here. I’ve looked on repology at some of my code and discovered that I haven’t even heard of a load of the distributions that package it, so expecting me to maintain a list of those (and keep it up to date with version information) sounds incredibly hard and likely to lead to a two-tier system (implicit in your use of the phrase ‘commonly used distributions’) where building on Ubuntu and Fedora is easy, building on less-popular targets is harder.
Interesting. So you could ask vcpkg to build you a library without even knowing it has build-time dependencies on some tools and vcpkg will automatically create a suitable host configuration, build those tools there, and pass them to the library’s so that it can execute them during its build?
Yes, but there’s a catch: vcpkg runs its builds as part of the configure stage, not as part of the build stage. This means that running cmake may take several minutes, when then running ninja completes in a second or two. If you modify vcpkg.json then this will force CMake to re-run and that will cause the packages to re-build. vcpkg packages have a notion of host tools, which are built with the triplet for your host configuration and are then exposed for the rest of the build. There are some known issues with it, so they might be starting down the same rabbit hole that you ended up with.
Yes, in build2 you normally don’t need any fudging, the C/C++ compile rules are prepared to deal with generated headers (via -MG or similar).
It’s the updating that I’m particularly interested in. Imagine that I have I have a make-headers build step that has sub-targets that generate foo.h and bar.h and then a I step for compiling prog.cc, which includes foo.h. On the first (non-incremental) build, I want the compile step that consumes prog.cc to depend on make-headers (big hammer, so that I don’t have to track which generated headers my prog.cc depends on). But after that I want the compiler to update the rule for prog.cc so that it depends only on foo.h. I’ve managed to produce some hacks that do this in CMake but they’re ugly and fragile. I’d love to have some explicit support for over-approximate dependencies that will be fixed during the first build. bmake’s meta mode does this by using a kernel module to watch the files that the compiler process reads and dynamically updating the build rules to depend on those. This has some nice side effects, such as causing a complete rebuild if you upgrade your compiler or a shared library that the compiler depends on.
Negative dependencies are a separate (and more painful problem).
As I mentioned earlier, I would be happy to do some hand-holding if you want to give it a try. Also, build2 is not exactly simple and has a very different mental model compared to CMake. In particular, CMake is a “mono-repo first” build system while build2 is decidedly “multi-repo first”. As a result, some things that are often taken as gospel by CMake users (like the output being a subdirectory of the source directory) is blasphemy in build2. So there might be some culture shock.
All of my builds are done from a separate ZFS dataset that has sync turned off, so out-of-tree builds are normal for me, but I’ve not had any problems with that in CMake. One of the projects that I’m currently working on looks quite a lot like a cross-compile SDK and so build2 might be a good fit (we provide some build tools and components and want consumers to pick up our build system components). I’ll do some reading and see how hard it would be to port it over to build2. It’s currently only about a hundred lines of CMake, so not so big that a complete rewrite would be painful.
This makes me a bit nervous because it seems very easy for non-portable things to creep in with this.
These are interesting points that admittedly we haven’t though much about, yet. But there are plans to support distributed compilation and caching which, I am sure, will force us to think this through.
One thing that I have been thinking about lately is how much logic should we allow one to put in a rule (since, being written in C++, there is not much that cannot be done). In other words, should rules be purely glue between the build system and the tools that do the actual work (e.g., generate some source code) or should we allow the rules to do the work themselves without any tools? To give a concrete example, it would be trivial in build2 to implement a rule that provides the xxd functionality without any external tools.
Either way I think the bulk of the rules will still be the glue type simply because nobody will want to re-implement protoc or moc directly in the rule. Which means the problem is actually more difficult: it’s not just the rules that you need to worry about, it’s also the tools. I don’t think you will easily convince many of them to work without a local filesystem.
That feels back to front because you’re traversing the graph in the opposite direction to the edge that must exist. Someone packaging libFoo for their distribution must know where libFoo comes from and so is in a position to maintain this mapping […]
From this point of view, yes. But consider also this scenario: whomever is packaging libFoo for, say, Debian is not using build2 (because libFoo upstream is, say, still uses CMake) and so has no interest in maintaining this mapping.
Perhaps this should just be a separate registry where any party (build2 package author, distribution package author, or an unrelated third party) can contribute the mapping. This will work fairly well for archive-based package repositories where we can easily merge this information into the repository metadata. But not so well for git-based where things are decentralized.
Imagine that I have I have a make-headers build step that has sub-targets that generate foo.h and bar.h and then a step for compiling prog.cc, which includes foo.h. On the first (non-incremental) build, I want the compile step that consumes prog.cc to depend on make-headers (big hammer, so that I don’t have to track which generated headers my prog.cc depends on). But after that I want the compiler to update the rule for prog.cc so that it depends only on foo.h.
You don’t need such “big hammer” aggregate steps in build2 (unless you must, for example, because the tool can only product all the headers at once). Here is a concrete example:
Where prog1.cc looks like this (in prog2.cc substitute foo with bar):
#include "foo.h"
int main ()
{
return FOO;
}
While this might look a bit impure (why does exe{prog1} depends on bar.h even though none of its sources use it), this works as expected. In particular, given a fully up-to-date build, if you remove foo.h, only exe{prog1} will be rebuilt. The mental model here is that the headers you list as prerequisites of an executable or library are a “pool” from which its source can “pick” what they need.
I’ll do some reading and see how hard it would be to port it over to build2. It’s currently only about a hundred lines of CMake, so not so big that a complete rewrite would be painful.
Sounds good. If this is public (or I can be granted access), I could even help.
Either way I think the bulk of the rules will still be the glue type simply because nobody will want to re-implement protoc or moc directly in the rule. Which means the problem is actually more difficult: it’s not just the rules that you need to worry about, it’s also the tools. I don’t think you will easily convince many of them to work without a local filesystem.
That’s increasingly a problem. There was a post here a few months back where someone had built clang as an AWS Lambda. I expect a lot of tools in the future will end up becoming things that can be deployed on FaaS platforms and then you really want the build system to understand how to translate between two namespaces (for example, to provide a compiler with a json dictionary of name to hash mappings for a content-addressable filesytem).
I forgot to provide you with a link to Farbique last time. I worked a bit on the design but never had time to do much implementation and Jon got distracted by other projects. We wanted to be able to run tools in Capsicum sandboxes (WASI picked up the Capsicum model, so the same requirements would apply to a WebAssembly/WASI FaaS service): the environment is responsible for opening files and providing descriptors into the tool’s world. This also has the nice property for a build system that the dependencies are, by construction, accurate: anything where you didn’t pass in a file descriptor is not able to be accessed by the task (though you can pass in directory descriptors for include directories as a coarse over approximation).
From this point of view, yes. But consider also this scenario: whomever is packaging libFoo for, say, Debian is not using build2 (because libFoo upstream is, say, still uses CMake) and so has no interest in maintaining this mapping.
I don’t think that person has to care, the person packaging something using libFoo needs to care and that creates an incentive for anyone packaging C/C++ libraries to keep the mapping up to date. I’d imagine that each repo would maintain this mapping. That’s really the only place where I can imagine that it can live without getting stale.
I’m more familiar with the FreeBSD packaging setup than Debian, so there may be some key differences. FreeBSD builds a new package set from the top of the package tree every few days. There’s a short lag (typically 1-3 days) between pushing a version bump to a port and users seeing the package version. Some users stay on the quarterly branch, which is updated less frequently. If I create a port for libFoo v1.0, then it will appear in the latest package set in a couple of days and, if I time it right, in the quarterly one soon after. Upstream libFoo notices and updates their map to say ‘FreeBSD has version 1.0 and it’s called libfoo`. Now I update the port to v1.1. Instantly, the upstream mapping is wrong for anyone who is building package sets themselves. A couple of days later, it’s wrong for anyone installing packages from the latest branch. A few weeks later, it’s wrong for anyone on the quarterly branch. There is no point at which the libFoo repo can hold a map that is correct for everyone unless they have three entries for FreeBSD, and even then they need to actively watch the status of builders to get it right.
In contrast, if I add a BUILD2_PACKAGE_NAME= and BUILD2_VERSION= line to my port (the second of which can default to the port version, so needs setting in a few corner cases), then it’s fairly easy to add some generic infrastructure to the ports system that builds a complete map for every single packaged library when you build a package set. This will then always be 100% up to date, because anyone changing a package will implicitly update it. I presume that the Debian package builders could do something similar with something in the source package manifest.
Note that the mapping needs to contain versions as well as names because the version in the package often doesn’t directly correspond to the upstream version. This gets especially tricky when the packaged version carries patches that are not yet upstreamed.
Oh, and options get more fun here. A lot of FreeBSD ports can build different flavours depending on the options that are set when building the package set. This needs to be part of the mapping. Again, this is fairly easy to drive from the port description but an immense amount of pain for anyone to try to generate from anywhere else. My company might be building a local package set that disables (or enables) an option that is the default upstream, so when I build something that uses build2 I may need to statically link a version of some library rather than using the system one, even though the default for a normal FreeBSD user would be to just depend on the package.
While this might look a bit impure (why does exe{prog1} depends on bar.h even though none of its sources use it), this works as expected. In particular, given a fully up-to-date build, if you remove foo.h, only exe{prog1} will be rebuilt. The mental model here is that the headers you list as prerequisites of an executable or library are a “pool” from which its source can “pick” what they need.
That is exactly what I want, nice! It feels like a basic thing for a C/C++ build system, yet it’s something I’ve not seen well supported anywhere else.
Sounds good. If this is public (or I can be granted access), I could even help.
It isn’t yet, hopefully later in the year…
Of course, the thing I’d really like to do (if I ever find myself with a few months of nothing to do) is replace the awful FreeBSD build system with something tolerable and it looks like build2 would be expressive enough for that. It has some fun things like needing to build the compiler that it then uses for later build steps, but it sounds as if build2 was designed with that kind of thing in mind.
Not all small projects will necessarily grow into a large project. The trick is recognizing when or if the project will outgrow its infrastructure. Makefiles have a much lower conceptual burden, because Makefiles very concretely describe how you want your build system to run; but they suffer when you try to add abstractions to them, to support things like different toolchains, or creating the compilation database (I assume you’ve seen bear?). If you need your build described more abstractly (like, if you need to do different things with the dependency tree than simply build), then a different build tool will work better for you. But it can be hard to understand what the build tool is actually doing, and how it decided to do it. There’s no global answer.
That’s it. That gives you targets to make my-prog, to clean the build, and will work on Windows, *NIX, or any other system that has a vaguely GCC or MSVC-like toolchain, supports debug and release builds, and generates a compile_commands.json for my editor to consume. If I want to add a dependency, let’s say on zstd, then it becomes:
This will work with system packages, or with something like vcpkg installing a local copy of a specific version for reproduceable builds.
Even for a simple project, the equivalent bmake file is about as complex and won’t let you target something like AIX or Windows without a lot more work, doesn’t support cross-compilation without some extra hoop jumping, and so on.
The common Makefile for this use case will be more lines of code (I never use bsd.prog.mk, etc., unless I’m actually working on the OS), but I think the word “complex” here obscures something important: that a Makefile can be considered simpler due to a very simple execution model, or a CMakeLists.txt can be considered simpler since it describes the compilation process more abstractly, allowing it to do a lot more with less.
For an example of why I think Makefile‘s are conceptually simpler, it is just as easy to use a Makefile with custom build tools as it is to compile C code. It’s much easier to understand:
CMake gets a lot of criticism, but I think a fair share of its problems it’s just that people haven’t stopped to learn about the tool. It’s a second-class language for some people, just like CSS.
There’s an association issue here too. Compiling C++ sucks. It is significantly trickier than many other languages. The dependency ecosystem is far less automated too. Many dependencies are incorporated into a conglomerate project. The build needs of those depdencies come along for the ride. The problems with all of these constituents are exposed as a symptom of the top line build utility for the parent project. If cmake had made it’s first inroads with another language it would likely have a more nuanced reputation. Not that it doesn’t bring its own problems too, but it surely takes the blame for a lot of C++’s problems.
I subscribe to the belief that every project should have a Makefile with tasks:
deps
installs all dependencies as automated as they can be (it’s OK if it outputs some manual steps, but a rerun should detect those things having been done and not output the manual nudges if they’re not needed)check
runs all linting and static analysis without modifying anythingtest
runs all unit tests minimally and any integration tests that are low impactbuild
produces some artifactall
does all of the aboveclean
deletes artifacts and, if possible, restores the codebase to as close to its original state as possiblehelp
outputs a list of tasksFor my Scala and Rust projects, this yields a ~14 line Makefile that just executes
sbt
orcargo
, respectively. For my Python and Ruby projects, there’s a lot more to it. Any sufficiently advanced system of build scripts or documentation eventually just reimplements make.All of this in pursuit of the idea that someone should be able to clone and verify a build representing a clean starting point for development and troubleshooting with
twothree commands:git clone whatever && cd whatever && make all
.This is… weird. If I run
git clone whatever && cd whatever && make all
, and it suddenly starts running apt commands with root, I’d be seriously pissed.“Install dependencies” probably refers to installing the dependencies in user space, for example “pip install –user”. In any case, nothing should be able to run root commands via sudo without your prior consent. If this is possible, e.g., because you “misconfigured” sudo, then you have other problems. ;)
I always assumed that I was alone doing so and now I am happy to discover that other people also find merit in a (small) Makefile with some standard targets like ‘all’, ‘build’, ‘test’, and ‘check’, regardless if the actual build System is something complete different (like, for example, Gradle, sbt, or Mill).
isn’t it kind of redundant for Scala?
deps
-sbt update
check
-sbt scalafix --check
test
-sbt test
build
-sbt publish
all
-sbt ';test;publish'
clean
-sbt clean
So the main value here is consistency with your other projects?
You’ve nearly fully reproduced it! The missing
help
task:The main value is a consistent onboarding across all projects without someone needing to know the specifics of a particular ecosystem and without having to read a long README (if it even exists).
At my company, we have more than 5,000 people writing code. Many are working in Java or Kotlin and JavaScript while another set is working in Go. Folks in my area, data science, are working mostly in Python but a few teams are slinging Scala. Without the convenience of a Makefile on-ramp, I can get an env up and running for our Java, Kotlin, and Scala projects without thinking because of the time I’ve spent in those ecosystems (Gradle and SBT, really). I touch JavaScript and Go so rarely that I have to look up stuff on how to do the things I want to do because there’s a false assumption of familiarity in those codebases. I’ve spent a lot of time improving my team’s Python devex but I have a long way to go to be able to play in some other teams’ sandboxes. It’s a lot easier now that I’m more familiar with Python tooling but I’d happily take a
make check test
over the ecosystem equivalent in any ecosystem so that I as a developer can care less about remembering the tooling and focus more on delivering value by expeditiously working in that codebase with guardrails, lints, tests, and everything else the maintainers care about running without me having to care about them (perhaps except when they fail, hopefully with actionable errors!).Here’s the problem: Every large project was once a small project. The FreeBSD build system, which is built on top of bmake, is an absolute nightmare to use. It is slow, impossible to modify, and when it breaks it’s completely incomprehensible trying to find out why.
For small projects, a CMake build system is typically 4-5 lines of CMake, so bmake isn’t really a win here, but CMake can grow a lot bigger before it becomes an unmaintainable mess and it’s improving all of the time. Oh, and it can also generate the compile_commands.json that your LSP implementation (clangd or whatever) uses to do syntax highlighting. I have never managed to make this work with bmake (@MaskRay published a script to do it but it never worked for me).
The problem is that cmake is actually literal hell to use. I would much rather use even the shittiest makefile than cmake.
Some of the “modern” cmake stuff is slightly less horrible. Maybe if the cmake community had moved on to using targets, things would’ve been a little better. But most of the time, you’re still stuck with ${FOO_INCLUDE_DIRS} and ${FOO_LIBRARIES}. And the absolutely terrible syntax and stringly typed nature won’t ever change.
Give me literally any build system – including an ad-hoc shell script – over cmake.
Agreed. Personally, I also detest meson/ninja in the same way. The only thing that I can tolerate writing AND using are BSD makefiles, POSIX makefiles, and plan9’s mkfiles
You are going to have a very fun time dealing with portability. Shared libraries, anyone?
Not really a problem, pkg-config tells your makefile what cflags and ldflags/ldlibs to add.
Using it is less the problem - creating shared libraries is much harder. Every linker is weird and special, even with ccld. As someone dealing with AIX in a dayjob…
Yes. The last time I seriously used cmake for cross compiles (trying to build third-party non-android code to integrate into an Android app) I ended up knee deep in
strace
to figure out which of the hundreds of thousands of lines of cmake scripts were being included from the system cmake directory, and then using gdb on a debug build of cmake to try to figure out where it was constructing the incorrect strings, because I had given up on actually being able to understand the cmake scripts themselves, and why they were double concatenating the path prefix.Using make for the cross compile was merely quite unpleasant.
Can we improve on make? Absolutely. But cmake is not that improvement.
What were you trying to build? I have cross-compiled hundreds of CMake things and I don’t think I’ve ever needed to do anything other than give it a cross-compile toolchain file on the command line. Oh, and that was cross-compiling for an experimental CPU, so no off-the-shelf support from anything, yet CMake required me to write a 10-line text file and pass it on the command line.
This was in 2019-ish, so I don’t remember which of the ported packages it was. It may have been some differential equation packages, opencv, or some other packages. There was some odd interaction between their cmake files and the android toolchain’s cmake helpers that lead to duplicated build directory prefixes like:
which was nearly impossible to debug. The fix was easy once I found the mis-expanded variable, but tracking it down was insanely painful. The happy path with cmake isn’t great but the sad path is bad enough that I’m not touching it in any new software I write.
The sad path with bmake is far sadder. I spent half a day trying to convince a bmake-based build system to compile the output from yacc as C++ instead of C before giving up. There was some magic somewhere but I have no idea where and a non-trivial bmake build system spans dozens of include files with syntax that looks like line noise. I’ll take
add_target_option
over${M:asdfasdfgkjnerihna}
any day.You’re describing the happy path.
Cmake ships with just over 112,000 lines of modules, and it seems any non trivial project gets between hundreds and thousands of lines of additional cmake customizations and copy-pasted modules on top of that. And if anything goes wrong in there, you need to get in and debug that code. In my experience, it often does.
With make, its usually easier to debug because there just isn’t as much crap pulled in. And even when there is, I can hack around it with a specific, ad-hoc target. With cmake, if something goes wrong deep inside it, I expect to spend a week getting it to work. And because I only touch cmake if I have to, I usually don’t have the choice of giving up – I just have to deal with it.
I’m very happy that these last couple years, I spend much of my paid time writing Go, and not dealing with other people’s broken build systems.
The core bmake files are over 10KLoC, which doesn’t include the built-in rules, and do far less than the CMake standard library (which includes cross compilation, finding dependencies using various tools, and so on). They are not namespaced, because bmake does not have any notion of scopes for variables, and so any one of them may define some variable that another consumes and
That is not my experience with any large project that I’ve worked on with a bmake or GNU make build system. They build some half-arsed analogue of a load of the CMake modules and, because there’s no notion of variable scope in these systems, everything depends on some variable that is set somewhere in a file that’s included at three levels of indirection by the thing that includes the
Makefile
for the component that you’re currently looking at. Everything is spooky action at a distance. You can’t find the thing that’s setting the variable, because it’s constructing the variable name by applying some complex pattern to the string. When I do find it, instead of functions with human-readable names, I discover that it’s a line like_LDADD_FROM_DPADD= ${DPADD:R:T:C;^lib(.*)$;-l\1;g}
(actual line from a bmake project, far from the worst I’ve seen, just the first one that jumped out opening a random.mk
file), which is far less readable than anything I’ve ever read in any non-Perl language.In contrast, modern CMake has properties on targets and the core modules are work with this kind of abstraction. There are a few places where some global variables still apply, but these are easy to find with grep. Everything else is scoped. If a target is doing something wrong, then I need to look at how that target is constructed. It may be as a result of some included modules, but finding they relevant part is usually easy.
The largest project that I’ve worked on with a CMake build system is LLVM, which has about 7KLoC of custom CMake modules. It’s not wonderful, but it’s far easier to modify the build system than I’ve found for make-based projects a tenth the size. The total time that I’ve wasted on CMake hacking for it over the last 15 years is less than a day. The time I’ve wasted failing to get Make-based (GNU Make or bmake) projects to do what I want is weeks over the same period.
Modern CMake is a lot better and it’s being aggressively pushed because things like vcpkg require modern CMake, or require you to wrap your crufty CMake in something with proper exported targets. Importing external dependencies.
I’ve worked on projects with large CMake infrastructure, large GNU make infrastructure, and large bmake infrastructure. I have endured vastly less suffering as a result of the CMake infrastructure than the other two. I have spent entire days trying to change things in make-based build systems and given up, whereas CMake I’ve just complained about how ugly the macro language is.
Would you be interested to try
build2
? I am willing to do some hand-holding (e.g., answer “How do I ..?” questions, etc) if that helps.To give a few points of comparison based on topics brought up in other comments:
The simple executable
buildfile
would be a one-liner like this:With the
libzstd
dependency:Here is a
buildfile
from a library (Linux Kconfig configuration system) that useslex
/yacc
: https://github.com/build2-packaging/kconfig/blob/master/liblkc/liblkc/buildfileWe have a separate section in the manual on the available build debugging mechanisms: https://build2.org/build2/doc/build2-build-system-manual.xhtml#intro-diag-debug
We have a collection of HOWTOs that may be of interest: https://github.com/build2/HOWTO/#readme
I like the idea of build2. I was hoping for a long time that Jon Anderson would finish Fabrique, which had some very nice properties (merging of objects for inheriting flags, a file type in the language that was distinct from a string and could be mapped to a path or a file descriptor on invocation).
Perhaps it’s just me, but I really don’t find that to be great syntax. Software in general (totally plausible rule of thumb that I was told and believe) is read around 10 times more than it is written. For build systems, that’s probably closer to 100, so terse syntax scares me.
The problem I have now is ecosystem lock in. 90% of the things that I want to depend on provides a CMake exported project. I can use
vcpkg
to grab thousands of libraries to statically link against and everything just works. From this example:How does it find zstd? Does it rely on an export target that zstd exposed, a built-in package, or some other mechanism?
CMake isn’t what I want, but I can see a fairly clear path to evolving it to be what I want. I don’t see that path for replacing it with something new and for the new thing to be worth replacing CMake it would need to be an order of magnitude better for my projects and able to consume CMake exported targets from other projects (not pkg-config, which can’t even provide flags for compiler invocations for Objective-C, let alone handle any of the difficult configuration cases). If it can consume CMake exported targets, then my incentive for libraries is to use CMake because then I can export a target that both it and CMake can consume.
No, it’s not just you, this is a fairly common complaint from people who first see it but interestingly not from people who used
build2
for some time (we ran a survey). I believe the terse syntax is beneficial for common constructs (and what I’ve shown is definitely one of the most common) because it doesn’t get in the way when trying to understand more complex buildfiles. At least this has been my experience.That depends on whether you are using just the build system or the build system and the package manager stack. If just the build system, then you can either specify the development build to import explicitly (e.g.,
config.import.libzstd=/tmp/libzstd
), bundle it with your project (in which it gets found automatically) or, failed all of the above,build2
will try to find the installed version (and extract additional options/libraries frompkg-config
files, if any).If you are using the package manager, then by default it will download and build
libzstd
from the package (but you can also instruct the package manager to use the system-installed version if you prefer). We happen to have thelibzstd
package sitting in the submission queue: https://queue.cppget.org/libzstdBut that’s a pretty vanilla case that most tools can handle these days. The more interesting one is
lex
/yacc
from thebuidfile
I linked. It uses the sameimport
mechanism to find the tools:And we have them packaged: https://cppget.org/reflex and https://cppget.org/byacc. And the package manager will download and build them for you. And it’s smart enough to know to do it in a seperate host configuration so that they can still be executed during the build even if you are cross-compiling. This works auto-magiclaly, even on Windows. (Another handy tool that can be used like that is
xxd
: https://cppget.org/xxd).I am clearly biased but I think it’s actually not that difficult to be an order of magnitude better than CMake, it’s just really difficult to see if all you’ve experienced is CMake (and maybe some
make
-based projects).Firstly, CMake is a meta build system which closes the door on quite a few things (for an example, check how CMake plans to support C++20 modules; in short it’s a “let’s pre-scan the world” approach). Then, on one side of this meta build system sandwich you have a really primitive build model with the famous CMake macro language. On the other you have the lowest common denominator problem of the underlying build systems. Even arguably the best of them (ninja) is quite a basic tool. The result is that every new functionality, say support for a new source code generator, has to be implemented in this dreaded macro language with an eye on the underlying build tools. In
build2
, in contrast, you can implement you own build system module in C++ and the toolchain will fetch, build, and load it for you automatically (pretty much the same as thelex
/yacc
tools above). Here is a demo I’ve made of a fairly elaborate source code generator setup for a user (reportedly it took a lot of hacking around to support in CMake and was the motivation for them to switch tobuild2
): https://github.com/build2/build2-dynamic-target-group-demo/That’s a great distinction to make. Terse syntax is fine for operations that I will read every time I look in the file, but it’s awful for things that I’ll see once every few months. I don’t know enough about build2 to comment on where it falls on this spectrum.
For me, the litmus test of a build systems is one that is very hard to apply to new ones: If I want to modify a build system for a large project that has aggregated for 10-20 years, how easy is it for me to understand their custom parts? CMake is not wonderful here, but generally the functions and macros are easy to find and to read once I’ve found them. bmake is awful because its line-noise syntax is impossible to search for (how do you find what the M modifier in an expression does in the documentation? “M” as a search string gives a lot of false positives!).
My experience with
pkg-config
is not very positive. It just about works for trivial options but is not sufficiently expressive for even simple things like different flags for debug and release builds, let alone anything with custom configuration options.That looks a lot more promising, especially being able to use the system-installed version. Do you provide some ontology that allows systems to map build2 package names to installed packages so that someone packaging a project that I build with build2 without having to do this translation for everything that they package?
This is a very nice property, though one that I already get from vcpkg + CMake.
The language is pretty awful, but the underlying object model doesn’t seem so bad and is probably something that could be exposed to another language with some refactoring (that’s probably the first thing that I’d want to do if I seriously spent time trying to improve CMake).
That’s very interesting and might be a good reason to switch for a project that I’m currently working on.
I have struggled in the past with generated header files with CMake, because the tools can build the dependency edges during the build, but I need a coarse-grained rule for the initial build that says ‘do the step that generates these headers before trying to build this target’ and there isn’t a great way of expressing that this is a fudge and so I can break that arc for incremental builds. Does build2 have a nice model for this kind of thing?
In
build2
, there are two ways to do custom things: you can write ad hoc pattern rules in a shell-like language (similar tomake
pattern rules, but portable and higher-level) and everything else (more elaborate rules, functions, configuration, etc) is written in C++(14). Granted C++ can be made an inscrutable mess, but at least it’s a known quantity and we try hard to keep things sane (you can get a taste of what that looks like from thebuild2-dynamic-target-group-demo/libbuild2-compiler
module I linked to earlier).pkg-config
has its issues, I agree, plus most build systems don’t (or can’t) use it correctly. For example, you wouldn’t try to cram both debug and release builds into a single library binary (e.g.,.a
or.so
; well, unless you are Apple, perhaps) so why try to cram both debug and release (or static/shared for that matter) options into the same.pc
file?Plus, besides the built-in values (
Cflags
, etc),pkg-config
allows for free-form variables. So you can extend the format how you see fit. For example, inbuild2
we use thebin.whole
variable to signal that the library should be linked in the “whole archive” mode (which we then translate into the appropriate linker options). Similarly, we’ve usedpkg-config
variable to convey C++20 modules information and it also panned out quite well. And we now convey custom C/C++ library metadata this way.So the question is do we subsume all the existing/simple cases and continue with
pkg-config
by extending its format for more advanced cases or do we invent a completely new format (which is what WG21’s SG15 is currently trying to do)?Not yet, but we had ideas along these lines though in a different direction: we were thinking of each
build2
package also providing a mapping to the system package names for the commonly used distributions (e.g.,libzstd-dev
for Debain/Ubuntu,libzstd-devel
for Fedora/etc) so that thebuild2
package manager can query the installed package’s version (e.g., to make sure the version constraints are satisfied) or to invoke the system package manager to install the system package. If we had such a mapping, it would also allow us to also achieve what you are describing.Interesting. So you could ask vcpkg to build you a library without even knowing it has build-time dependencies on some tools and vcpkg will automatically create a suitable host configuration, build those tools there, and pass them to the library’s so that it can execute them during its build?
If so, that’s quite impressive. For us, the “create a suitable host configuration” part turned into a particularly deep rabbit hold. What is “suitable”? In our case we’ve decided to use the same compiler/options as what was used to build
build2
. But what if thePATH
environment variable has changed and nowclang++
resolves to something else? So we had to invent a notion of hermetic build configurations where we save all the environment variables that affect every tool involved in the build (likeCPATH
and friends). One nice off-shot of this work is that now in non-hermetic build configurations (which are the default), we detect changes to the environment variables besides everything else (sources, options, compiler versions, etc).Yes, in
build2
you normally don’t need any fudging, the C/C++ compile rules are prepared to deal with generated headers (via-MG
or similar). There are use-cases where it’s impossible to handle the generated headers fully dynamically (for example, because the compiler may pick up a wrong/outdated header from another search path) but this is also taken care of. See this article for the gory details: https://github.com/build2/HOWTO/blob/master/entries/handle-auto-generated-headers.mdAs I mentioned earlier, I would be happy to do some hand-holding if you want to give it a try. Also,
build2
is not exactly simple and has a very different mental model compared to CMake. In particular, CMake is a “mono-repo first” build system whilebuild2
is decidedly “multi-repo first”. As a result, some things that are often taken as gospel by CMake users (like the output being a subdirectory of the source directory) is blasphemy inbuild2
. So there might be some culture shock.BTW, in your earlier post you’ve mentioned Fabrique by Jon Anderson but I can’t seem to find any traces of it. Do you have any links?
This makes me a bit nervous because it seems very easy for non-portable things to creep in with this. To give a concrete example, if my build environment is a cloud service then I may not have a local filesystem and anything using the standard library for file I/O will be annoying to port. Similarly, if I want to use something like Capsicum to sandbox my build then I need to ensure that descriptors for files read by these modules are provided externally.
It looks as if the abstractions there are fairly clean, but I wonder if there’s any way of linting this. It would be quite nice if this could use WASI as the host interface (even if compiling to native code) so that you had something that at least can be made to run anywhere.
My bias against
pkg-config
originates from trying to use it with Objective-C. I gave up trying to add an--objc-flags
and –objcxx-flags` option because the structure of the code made this kind of extension too hard. Objective-C is built with the same compiler as C/C++ and takes mostly the same options, yet it wasn’t possible to support. This made me very nervous that the system could adapt to any changes in requirements from C/C++ and no chance of providing information for any other language. This was about 15 years ago, so it may have improved since thne.That feels back to front because you’re traversing the graph in the opposite direction to the edge that must exist. Someone packaging libFoo for their distribution must know where libFoo comes from and so is in a position to maintain this mapping (we could fairly trivially automate it from the FreeBSD ports system for any package that we build from a cppget source, for example). In contrast, the author of a package doesn’t always know where things come from here. I’ve looked on repology at some of my code and discovered that I haven’t even heard of a load of the distributions that package it, so expecting me to maintain a list of those (and keep it up to date with version information) sounds incredibly hard and likely to lead to a two-tier system (implicit in your use of the phrase ‘commonly used distributions’) where building on Ubuntu and Fedora is easy, building on less-popular targets is harder.
Yes, but there’s a catch: vcpkg runs its builds as part of the configure stage, not as part of the build stage. This means that running
cmake
may take several minutes, when then runningninja
completes in a second or two. If you modifyvcpkg.json
then this will force CMake to re-run and that will cause the packages to re-build. vcpkg packages have a notion of host tools, which are built with the triplet for your host configuration and are then exposed for the rest of the build. There are some known issues with it, so they might be starting down the same rabbit hole that you ended up with.It’s the updating that I’m particularly interested in. Imagine that I have I have a make-headers build step that has sub-targets that generate
foo.h
andbar.h
and then a I step for compilingprog.cc
, which includesfoo.h
. On the first (non-incremental) build, I want the compile step that consumesprog.cc
to depend onmake-headers
(big hammer, so that I don’t have to track which generated headers myprog.cc
depends on). But after that I want the compiler to update the rule forprog.cc
so that it depends only onfoo.h
. I’ve managed to produce some hacks that do this in CMake but they’re ugly and fragile. I’d love to have some explicit support for over-approximate dependencies that will be fixed during the first build. bmake’s meta mode does this by using a kernel module to watch the files that the compiler process reads and dynamically updating the build rules to depend on those. This has some nice side effects, such as causing a complete rebuild if you upgrade your compiler or a shared library that the compiler depends on.Negative dependencies are a separate (and more painful problem).
All of my builds are done from a separate ZFS dataset that has sync turned off, so out-of-tree builds are normal for me, but I’ve not had any problems with that in CMake. One of the projects that I’m currently working on looks quite a lot like a cross-compile SDK and so build2 might be a good fit (we provide some build tools and components and want consumers to pick up our build system components). I’ll do some reading and see how hard it would be to port it over to build2. It’s currently only about a hundred lines of CMake, so not so big that a complete rewrite would be painful.
These are interesting points that admittedly we haven’t though much about, yet. But there are plans to support distributed compilation and caching which, I am sure, will force us to think this through.
One thing that I have been thinking about lately is how much logic should we allow one to put in a rule (since, being written in C++, there is not much that cannot be done). In other words, should rules be purely glue between the build system and the tools that do the actual work (e.g., generate some source code) or should we allow the rules to do the work themselves without any tools? To give a concrete example, it would be trivial in
build2
to implement a rule that provides thexxd
functionality without any external tools.Either way I think the bulk of the rules will still be the glue type simply because nobody will want to re-implement
protoc
ormoc
directly in the rule. Which means the problem is actually more difficult: it’s not just the rules that you need to worry about, it’s also the tools. I don’t think you will easily convince many of them to work without a local filesystem.From this point of view, yes. But consider also this scenario: whomever is packaging libFoo for, say, Debian is not using
build2
(because libFoo upstream is, say, still uses CMake) and so has no interest in maintaining this mapping.Perhaps this should just be a separate registry where any party (
build2
package author, distribution package author, or an unrelated third party) can contribute the mapping. This will work fairly well for archive-based package repositories where we can easily merge this information into the repository metadata. But not so well for git-based where things are decentralized.You don’t need such “big hammer” aggregate steps in
build2
(unless you must, for example, because the tool can only product all the headers at once). Here is a concrete example:Where
prog1.cc
looks like this (inprog2.cc
substitutefoo
withbar
):While this might look a bit impure (why does
exe{prog1}
depends onbar.h
even though none of its sources use it), this works as expected. In particular, given a fully up-to-date build, if you removefoo.h
, onlyexe{prog1}
will be rebuilt. The mental model here is that the headers you list as prerequisites of an executable or library are a “pool” from which its source can “pick” what they need.Sounds good. If this is public (or I can be granted access), I could even help.
That’s increasingly a problem. There was a post here a few months back where someone had built clang as an AWS Lambda. I expect a lot of tools in the future will end up becoming things that can be deployed on FaaS platforms and then you really want the build system to understand how to translate between two namespaces (for example, to provide a compiler with a json dictionary of name to hash mappings for a content-addressable filesytem).
I forgot to provide you with a link to Farbique last time. I worked a bit on the design but never had time to do much implementation and Jon got distracted by other projects. We wanted to be able to run tools in Capsicum sandboxes (WASI picked up the Capsicum model, so the same requirements would apply to a WebAssembly/WASI FaaS service): the environment is responsible for opening files and providing descriptors into the tool’s world. This also has the nice property for a build system that the dependencies are, by construction, accurate: anything where you didn’t pass in a file descriptor is not able to be accessed by the task (though you can pass in directory descriptors for include directories as a coarse over approximation).
I don’t think that person has to care, the person packaging something using libFoo needs to care and that creates an incentive for anyone packaging C/C++ libraries to keep the mapping up to date. I’d imagine that each repo would maintain this mapping. That’s really the only place where I can imagine that it can live without getting stale.
I’m more familiar with the FreeBSD packaging setup than Debian, so there may be some key differences. FreeBSD builds a new package set from the top of the package tree every few days. There’s a short lag (typically 1-3 days) between pushing a version bump to a port and users seeing the package version. Some users stay on the quarterly branch, which is updated less frequently. If I create a port for libFoo v1.0, then it will appear in the latest package set in a couple of days and, if I time it right, in the quarterly one soon after. Upstream libFoo notices and updates their map to say ‘FreeBSD has version 1.0 and it’s called libfoo`. Now I update the port to v1.1. Instantly, the upstream mapping is wrong for anyone who is building package sets themselves. A couple of days later, it’s wrong for anyone installing packages from the latest branch. A few weeks later, it’s wrong for anyone on the quarterly branch. There is no point at which the libFoo repo can hold a map that is correct for everyone unless they have three entries for FreeBSD, and even then they need to actively watch the status of builders to get it right.
In contrast, if I add a
BUILD2_PACKAGE_NAME=
andBUILD2_VERSION=
line to my port (the second of which can default to the port version, so needs setting in a few corner cases), then it’s fairly easy to add some generic infrastructure to the ports system that builds a complete map for every single packaged library when you build a package set. This will then always be 100% up to date, because anyone changing a package will implicitly update it. I presume that the Debian package builders could do something similar with something in the source package manifest.Note that the mapping needs to contain versions as well as names because the version in the package often doesn’t directly correspond to the upstream version. This gets especially tricky when the packaged version carries patches that are not yet upstreamed.
Oh, and options get more fun here. A lot of FreeBSD ports can build different flavours depending on the options that are set when building the package set. This needs to be part of the mapping. Again, this is fairly easy to drive from the port description but an immense amount of pain for anyone to try to generate from anywhere else. My company might be building a local package set that disables (or enables) an option that is the default upstream, so when I build something that uses build2 I may need to statically link a version of some library rather than using the system one, even though the default for a normal FreeBSD user would be to just depend on the package.
That is exactly what I want, nice! It feels like a basic thing for a C/C++ build system, yet it’s something I’ve not seen well supported anywhere else.
It isn’t yet, hopefully later in the year…
Of course, the thing I’d really like to do (if I ever find myself with a few months of nothing to do) is replace the awful FreeBSD build system with something tolerable and it looks like build2 would be expressive enough for that. It has some fun things like needing to build the compiler that it then uses for later build steps, but it sounds as if build2 was designed with that kind of thing in mind.
Not all small projects will necessarily grow into a large project. The trick is recognizing when or if the project will outgrow its infrastructure. Makefiles have a much lower conceptual burden, because Makefiles very concretely describe how you want your build system to run; but they suffer when you try to add abstractions to them, to support things like different toolchains, or creating the compilation database (I assume you’ve seen bear?). If you need your build described more abstractly (like, if you need to do different things with the dependency tree than simply build), then a different build tool will work better for you. But it can be hard to understand what the build tool is actually doing, and how it decided to do it. There’s no global answer.
This is the CMake file that you need for a trivial C/C++ project:
That’s it. That gives you targets to make
my-prog
, to clean the build, and will work on Windows, *NIX, or any other system that has a vaguely GCC or MSVC-like toolchain, supports debug and release builds, and generates acompile_commands.json
for my editor to consume. If I want to add a dependency, let’s say onzstd
, then it becomes:This will work with system packages, or with something like
vcpkg
installing a local copy of a specific version for reproduceable builds.Even for a simple project, the equivalent
bmake
file is about as complex and won’t let you target something like AIX or Windows without a lot more work, doesn’t support cross-compilation without some extra hoop jumping, and so on.The common
Makefile
for this use case will be more lines of code (I never usebsd.prog.mk
, etc., unless I’m actually working on the OS), but I think the word “complex” here obscures something important: that aMakefile
can be considered simpler due to a very simple execution model, or aCMakeLists.txt
can be considered simpler since it describes the compilation process more abstractly, allowing it to do a lot more with less.For an example of why I think
Makefile
‘s are conceptually simpler, it is just as easy to use aMakefile
with custom build tools as it is to compile C code. It’s much easier to understand:than it is to figure out how to use https://cmake.org/cmake/help/latest/command/add_custom_command.html to similar effect; or to try to act like a first-class citizen, and make
add_executable
to work with.precursor
files.CMake gets a lot of criticism, but I think a fair share of its problems it’s just that people haven’t stopped to learn about the tool. It’s a second-class language for some people, just like CSS.
There’s an association issue here too. Compiling C++ sucks. It is significantly trickier than many other languages. The dependency ecosystem is far less automated too. Many dependencies are incorporated into a conglomerate project. The build needs of those depdencies come along for the ride. The problems with all of these constituents are exposed as a symptom of the top line build utility for the parent project. If cmake had made it’s first inroads with another language it would likely have a more nuanced reputation. Not that it doesn’t bring its own problems too, but it surely takes the blame for a lot of C++’s problems.
you can also try xmake. It is fast and lightweigt and contains a package manager.