Wrong title. This post is fairly interesting and well written, but it doesn’t really explain why we need build systems. Instead, it tells us what build systems do. And while I do see the author trying to push us towards widely used build systems such as CMake, he offers little justification. He mentions that most developers seem to think CMake make them suffer, but then utterly fails to address the problem. Are we supposed to just deal with it?
For simple build system like GNU Make the developer must specify and maintain these dependencies manually.
Not quite true, there are tricks that allows GNU Make to keep track of dependencies automatically, thanks to the -M option from GCC and Clang. Kind of a pain in the butt, but it can be done.
A wildcard approach to filenames (e.g. src/*.cpp) superficially seems more straightforward as it doesn’t require the developer to list each file allowing new files to be easily added. The downside is that the build system does not have a definitive list of the source code files for a given artefact, making it harder to track dependencies and understand precisely what components are required. Wildcards also allow spurious files to be included in the build – maybe an older module that has been superseded but not removed from the source folder.
First, tracking dependencies should be the build system’s job. It can and has been done. Second, if you have spurious files in your source tree, you should remove them. Third, if you forget to remove an obsolete module, I bet my hat you also forgot to remove it from the list of source files.
Best practice says to list all source modules individually despite the, hopefully minor, extra workload involved when first configuring the project or adding additional modules as the project evolves.
In my opinion, best practice is wrong. I’ll accept that current tools are limited, but we shouldn’t have to redundantly type out dependencies that are right there in the source tree.
That’s it for the hate. Let’s talk solutions. I personally recommend taking a look at SHAKE, as well as the paper that explains the theory behind it (and other build systems as well). I’ve read the paper, and it has given me faith in the possibility of better, simpler build systems.
We need to distinguish between build execution (ninja) and build configuration (autotools). The paper is about the execution. Most of complexity is in the configuration. (The paper is great though 👍)
I’m a peculiar user. What I want (and build) is simple, opinionated software. This is the Way.
I don’t need, nor want, my build system to cater to God knows how many environments, like CMake does. I don’t care that my dependencies are using CMake or the autotools. I don’t seek compatibility with those monstrosities. If it means I have to rewrite some big build script from scratch, so be it. Though in all honesty, I’m okay with just calling the original build script and using the artefacts directly.
I don’t need, nor want, my build system to treat stuff like unit testing and continuous integration specially. I want it to be flexible enough that I can generate a text file with the test results, or install & launch the application on the production server.
I want my build system to be equally useful for C, C++, Haskell, Rust, LaTeX, and pretty much anything. Just a thing that uses commands to generate missing dependencies. And even then most commands can be as simple as calling some program. They don’t have to support Bash syntax or whatever. I want multiple targets and dynamic dependencies. And most of all, I want a strong mathematical foundation behind the build system. I don’t want to have to rebuild the world “just in case”.
Or, I want a magical build system where I just tell it where’s the entry point of my program, and it just fetches and builds the transitive extension of the dependencies. Which seems possible on some closed ecosystems like Rust or Go. And I want that build system to give me an easy way to run unit tests as part of the build, as well as installing my program, or at least giving me installation scripts. (This is somewhat contrary to the generic build system above.)
That said, if the generic build system can stay simple and is easy enough to use, I probably won’t need the “walled garden” version.
Your comment revealed some blind spots in my current design. I am going to have to go back to the drawing board and try again.
I think a big challenge would be to generate missing dependencies for C and C++, since files can be laid out haphazardly with no rhyme or reason. However, for most other languages, which have true module systems, that may be more possible.
Note: We do not recommend using GLOB to collect a list of source files from your source tree: If no CMakeLists.txt file changes when a source is added or removed, then the generated build system cannot know when to ask CMake to regenerate.
I heard the same reason is why Meson doesn’t support it.
Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.
So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.
Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.
Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.
So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.
Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.
Second, if you have spurious files in your source tree, you should remove them.
Conditionally compiling code on the file level is one of the best ways to do it, especially if you have some kind of plugin system (or class system). It’s cleaner that ifdefing out big chunks of code IMO.
Traditionally, the reason has been because if you want make to rebuild your code correctly when you remove a file you have to do something like
which is a bit annoying, and definitely error-prone.
Third, if you forget to remove an obsolete module, I bet my hat you also forgot to remove it from the list of source files.
One additional reason is that it can be nice when working on something which hasn’t been checked in yet. Imagine that you are working on adding the new Foo feature, which lives in foo.c. If you then need to switch branches, git stash and git checkout will leave foo.c lying around. By specifying the sources you want explicitly, you don’t have to worry about accidentally including it.
Conditionally compiling code on the file level is one of the best ways to do it, especially if you have some kind of plugin system (or class system). It’s cleaner that ifdefing out big chunks of code IMO.
Okay, that’s a bloody good argument. Add to that the performance implication of listing every source file every time you build, and you have a fairly solid reason to maintain a static list of source files.
A fantastic and very comprehensive guide to steps involved in building C/C++ code.
However, I am very vehemently disagree with the usage of CMake. Based on the depth shown here though, I’m very interested to see the next few guides to see the author’s viewpoints and practices regarding it.
The big main point for CMake is it being cross-platform and supported by multiple editors. CLion and Visual Studio support CMake, and the big thing you get is the various generators for Visual Studio solutions and Xcode projects, or compilation databases for tools, but overall I’ve always had it be a behemoth effort to set up and maintain appropriately.
There’s also several really good alternatives not mentioned: Xmake, FASTbuild, Bazel, and Buck. Overall, the build landscape of C++ is an endless time sink, which alone was a reason for me to invest time in alternative low-level languages for hobby work. Rust uses Rust to write custom builds in cargo, and Ada uses an Ada-like syntax for its gprbuild tool.
Rust’s Cargo is a sort-of a build system, but being based on convention over configuration is such a bliss. For majority of projects it completely disappears.
This is also the perk of building the ecosystem with the language. In all my Rust work over the years, I’ve only have to fiddle with cargo a few times for very specific reasons. It’s a definite major selling point for Rust.
On the Ada side, their existing gprbuild conventions seem like they’re being boiled into their new Alire tool (it’s pretty much “cargo for Ada”). Rust pretty much showed everyone “this is the way”.
Wrong title. This post is fairly interesting and well written, but it doesn’t really explain why we need build systems. Instead, it tells us what build systems do. And while I do see the author trying to push us towards widely used build systems such as CMake, he offers little justification. He mentions that most developers seem to think CMake make them suffer, but then utterly fails to address the problem. Are we supposed to just deal with it?
Not quite true, there are tricks that allows GNU Make to keep track of dependencies automatically, thanks to the
-M
option from GCC and Clang. Kind of a pain in the butt, but it can be done.First, tracking dependencies should be the build system’s job. It can and has been done. Second, if you have spurious files in your source tree, you should remove them. Third, if you forget to remove an obsolete module, I bet my hat you also forgot to remove it from the list of source files.
In my opinion, best practice is wrong. I’ll accept that current tools are limited, but we shouldn’t have to redundantly type out dependencies that are right there in the source tree.
That’s it for the hate. Let’s talk solutions. I personally recommend taking a look at SHAKE, as well as the paper that explains the theory behind it (and other build systems as well). I’ve read the paper, and it has given me faith in the possibility of better, simpler build systems.
We need to distinguish between build execution (ninja) and build configuration (autotools). The paper is about the execution. Most of complexity is in the configuration. (The paper is great though 👍)
I have looked at SHAKE and its paper before, but I am curious: what would you like to see in a build system?
I ask because I am building one. 1
I’m a peculiar user. What I want (and build) is simple, opinionated software. This is the Way.
I don’t need, nor want, my build system to cater to God knows how many environments, like CMake does. I don’t care that my dependencies are using CMake or the autotools. I don’t seek compatibility with those monstrosities. If it means I have to rewrite some big build script from scratch, so be it. Though in all honesty, I’m okay with just calling the original build script and using the artefacts directly.
I don’t need, nor want, my build system to treat stuff like unit testing and continuous integration specially. I want it to be flexible enough that I can generate a text file with the test results, or install & launch the application on the production server.
I want my build system to be equally useful for C, C++, Haskell, Rust, LaTeX, and pretty much anything. Just a thing that uses commands to generate missing dependencies. And even then most commands can be as simple as calling some program. They don’t have to support Bash syntax or whatever. I want multiple targets and dynamic dependencies. And most of all, I want a strong mathematical foundation behind the build system. I don’t want to have to rebuild the world “just in case”.
Or, I want a magical build system where I just tell it where’s the entry point of my program, and it just fetches and builds the transitive extension of the dependencies. Which seems possible on some closed ecosystems like Rust or Go. And I want that build system to give me an easy way to run unit tests as part of the build, as well as installing my program, or at least giving me installation scripts. (This is somewhat contrary to the generic build system above.)
That said, if the generic build system can stay simple and is easy enough to use, I probably won’t need the “walled garden” version.
Goodness; you know exactly what you want.
Your comment revealed some blind spots in my current design. I am going to have to go back to the drawing board and try again.
I think a big challenge would be to generate missing dependencies for C and C++, since files can be laid out haphazardly with no rhyme or reason. However, for most other languages, which have true module systems, that may be more possible.
Thank you.
The real reason why globbing source files is unsound, at least in the context of CMake:
I heard the same reason is why Meson doesn’t support it.
Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.
So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.
Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.
Oh, so it’s a limitation of the tool, not something we actually desire… Here’s what I think: such glob patterns would typically be useful at link time, where you want to have the executable (or library) to aggregate all object files. Now the list of object files depend on the list of source files, which itself depends on the result of the glob pattern.
So to generate the program, the system would fetch the list of object files. That list depends on the list of source files, and should be generated whenever the list of source file changes. As for the list of source files, well, it changes whenever we actually add or remove a source file. As for how we should detect it, well… this would mean generating the list anew every time, and see if it changed.
Okay, so there is one fundamental limitation here: if we have many many files in the project, using glob patterns can make the build system slower. It might be a good idea in this case to fix the list of source files. Now, I still want a script that lists all available source files so I don’t have to manually add it every time I add a new file. But I understand the rationale better now.
see: tup
Conditionally compiling code on the file level is one of the best ways to do it, especially if you have some kind of plugin system (or class system). It’s cleaner that
ifdef
ing out big chunks of code IMO.Traditionally, the reason has been because if you want make to rebuild your code correctly when you remove a file you have to do something like
which is a bit annoying, and definitely error-prone.
One additional reason is that it can be nice when working on something which hasn’t been checked in yet. Imagine that you are working on adding the new Foo feature, which lives in
foo.c
. If you then need to switch branches,git stash
andgit checkout
will leavefoo.c
lying around. By specifying the sources you want explicitly, you don’t have to worry about accidentally including it.Okay, that’s a bloody good argument. Add to that the performance implication of listing every source file every time you build, and you have a fairly solid reason to maintain a static list of source files.
Damn… I guess I stand corrected.
A fantastic and very comprehensive guide to steps involved in building C/C++ code.
However, I am very vehemently disagree with the usage of CMake. Based on the depth shown here though, I’m very interested to see the next few guides to see the author’s viewpoints and practices regarding it.
The big main point for CMake is it being cross-platform and supported by multiple editors. CLion and Visual Studio support CMake, and the big thing you get is the various generators for Visual Studio solutions and Xcode projects, or compilation databases for tools, but overall I’ve always had it be a behemoth effort to set up and maintain appropriately.
There’s also several really good alternatives not mentioned: Xmake, FASTbuild, Bazel, and Buck. Overall, the build landscape of C++ is an endless time sink, which alone was a reason for me to invest time in alternative low-level languages for hobby work. Rust uses Rust to write custom builds in cargo, and Ada uses an Ada-like syntax for its gprbuild tool.
Rust’s Cargo is a sort-of a build system, but being based on convention over configuration is such a bliss. For majority of projects it completely disappears.
This is also the perk of building the ecosystem with the language. In all my Rust work over the years, I’ve only have to fiddle with cargo a few times for very specific reasons. It’s a definite major selling point for Rust.
On the Ada side, their existing gprbuild conventions seem like they’re being boiled into their new Alire tool (it’s pretty much “cargo for Ada”). Rust pretty much showed everyone “this is the way”.
[Comment removed by author]