1. 31
  1.  

  2. 16

    This is a good description of an aspect of compiling C code. It’s useful to note though that in C++, Rust, or any other language which includes monomorphized generics, this model doesn’t quite work as simply. Monomorphization (the generation of concretely-typed copies of generic code which enable static dispatch) is a strategy which often helps runtime performance of the generated code (relative to performing dynamic dispatch using runtime type information), but it means that the code required to be included in an object file is dependent on the callers of the code in the object file. If those callers introduce a new combination of concrete types for any generics, you need to make the appropriate copies of the generic code and update the object file.

    In C++, this means that templated stuff goes into your header files (because C++ otherwise keeps C’s model where the file is the unit of compilation). In Rust, the compilation model instead has the crate as the fundamental compilation unit. Rust does perform incremental compilation which tries to detect what specifically has changed and what recompilation that requires, but this different boundary means trying to improve compile times may need a different approach. There are contexts where splitting things out into separate crates will help.

    1. 1

      TIL this. Thank you.

      1. 1

        Rust does perform incremental compilation which tries to detect what specifically has changed and what recompilation that requires

        “Tries” here implies that the process is imperfect. Why is it more difficult than just building a dependency graph that is aware of “templated things”, and recompiling according to that graph when things change?

        1. 1

          The compilation model of Rust is more like if you wrote everything in header files, and the compiler generated a single translation unit that included all of those in a single file – the crate is the translation unit.

          So the difference is that in C and C++, the programmer has already done some work of splitting it up in units that can be compiled independently, whereas doing anything incremental or parallel in the Rust model requires the compiler to see the finer grained chunks of code.

          I wonder if C++20 modules are anything like this.

          1. 1

            My question stands, however - why is it so difficult to build a dependency graph between functions in a crate/translation unit? Or, if the problem that Rust has difficulty caching results of a single function, as opposed to a whole file or translation unit - why is that difficult? Other languages (.NET) have been doing function-level object code caching for years, if not decades.

            1. 1

              I have no experience writing compilers, but I imagine the situation must get more complicated due to optimisations that take global information into account. Imagine, for example, the decision to inline a given function call depends on the program-wide total number of call sites of that same function.

              1. 1

                You can represent that inlining as another edge in the graph, with the “must_be_updated = true” property that “templated thing instantiations” have - it would be exactly the same mechanism - and could store meta-information about the fact that the function was inlined (and where) with the function itself.

                The more general point is that I was hoping for an answer to my question that either answered specifically why Rust’s compiler does have trouble, or else a really good theoretical explanation that can’t be countered as simply as I just did.

      2. 12

        Make is definitely my favourite of the usual unix tools, and one I’d recommend all beginners to learn about. I’m betting most people think it’s only a build system for C, and even when it’s used sometimes it’s only as a task runner. It works for any command that takes files as an input and produces a file as output, and you get incremental and (if you define the tasks right) parallelized builds for free! And if you don’t like make itself, there’s a ton of language-specific clones like Rake or Jake that support most of its features .I just wish it could deal better with tasks that produce multiple files, not even the clones usually support that.

        I’m using Jake on a project where I have to parse HTML and JS files and do some compilin’ and interpretin’; it’s important that I have very clear data provenance from the source files because it’s copyrighted material and I want to be able to distribute the project without including any of the incriminating data, so having a tool like make to keep track of all the inputs and outputs is very useful. JS also has some great libraries for parsing both HTML and JS, and it would be really awkward to wrap the functions that do these transformations as command line programs, and with Jake I can just call the functions in the tasks.

        1. 2

          As a fellow lover of Make, I’ve been getting some good use out of remake lately and thought you might like it if you hadn’t seen it.

          1. 1

            I agree make is great! I use it to run my backup. (There are different portions of my hard disk that need to be backed up in a certain order.)

            1. 1

              I just wish it could deal better with tasks that produce multiple files, not even the clones usually support that.

              In make 4.3:

              • New feature: Grouped explicit targets

              Pattern rules have always had the ability to generate multiple targets with a single invocation of the recipe. It’s now possible to declare that an explicit rule generates multiple targets with a single invocation. To use this, replace the “:” token with “&:” in the rule. To detect this feature search for ‘grouped-target’ in the .FEATURES special variable. Implementation contributed by Kaz Kylheku kaz@kylheku.com

            2. 6

              Relevant xkcd: https://xkcd.com/1053/ .

              1. 4

                Build caching and optimization is a wonderful thing!

                Haskell is known for its long compile times, so most of the companies I know split their products into multiple packages with the shallowest dependency tree possible, and use nix for package level caching. While that tends to require at least one dedicated employee for build purposes, I know of at least one company where devs could push changes into production fifteen minutes after receiving a laptop that was fresh from the hands of IT. Another company with a large Haskell codebase reduced their developer builds to thirty seconds with nix.

                1. 4

                  So is post advocating that you use Make instead of a shell script?

                  • For small projects, I would recommend the opposite. Writing a shell script is fast enough; C compilers are fast. You can compile 10K lines of C or C++ in seconds, or 100K+ lines if you structure your code carefully. (It’s also not a bad idea to understand how the C compiler’s command line works, which is big with quirks. Make tends to hide that in CFLAGS and other weirdness in its built-in rules database.)
                  • For bigger projects, I’d recommend using something that generates Ninja files. Ninja files look extremely similar to Makefiles; I think of it like a simple / fast / correct Makefile.

                  In my case, I used Python, but people also use CMake, and Google uses GN and Blueprint for Chrome and Android. I think Meson also generates Ninja.

                  I would skip Make as it’s an impoverished language which can’t handle many situations well. It supports parallel builds, but doesn’t really give you any help with with them, which is what you REALLY need to make a build fast.

                  The most common usage of Make is generated anyway, e.g. via autotools, so IMO you might as well skip that and generate code for a better runtime, i.e. Ninja. If you don’t need the configure support, then you can use a shell script.

                  There are a number of classic posts with all the deficiencies of classic make and GNU make; I could dig them up if anyone is interested. They might be in some of these blog posts: http://www.oilshell.org/blog/tags.html?tag=make#make

                  Using a real language + Ninja basically solves all of them as far as I can tell (except if you want to use something besides timestamps for invalidation, but GNU make doesn’t do any better.) To be fair, CMake is probably the most popular choice, but it doesn’t probably qualify as a “real language”, but I guess it has grown enough features to do the things everyone wants.

                  I wrote 3 makefiles for 3 different apps from scratch (Oil itself, the whole oilshell.org website, and log file analysis) and used and maintained them for YEARS before coming to this conclusion [1]. Incrementality and parallelism are great; the Make language is just too impoverished to fully take advantage of them. There’s a reason they embedded Guile Scheme in GNU make; yet few people use it because it isn’t turned on by default in many distros.

                  I switched a big portion of Oil’s build system Ninja and am going to switch everything to it. This only affects developers and not end users, who still get a shell script. Some detail here about it: https://old.reddit.com/r/oilshell/comments/m8za8b/release_of_oil_088/grmbdxn/

                  BTW Android used to use a very carefully written and huge GNU Makefile with lots of metaprogramming, but has now switched to a Ninja generator.

                  (On re-reading I guess the post is more about incrementality and avoiding building from scratch. But most of the comments are about Make so I think this comment will still stand. Also I think incrementality is overrated for small projects, which she is talking about. Actually I think it would be fun to do a benchmark of parallel build from scratch vs. incremental build on a single core, which she is doing – I think parallel wins in most realistic situations)

                  [1] related comment I found: https://news.ycombinator.com/item?id=19062460

                  1. 1

                    You can compile 10K lines of C or C++ in seconds, or 100K+ lines if you structure your code carefully.

                    Two observations here:

                    First, seconds are a long time, and it’s possible to do much better.

                    Second, the way to structure code to minimize build times without reusing objects is to keep code in as few source files as possible, but this does does not promote modular code.

                    If anything, I’d argue that the main advantage of a build system that can reuse objects is that it allows and encourages much smaller objects.

                    1. 1

                      Sure, Ninja lets you do all of that! If your project will get the benefit, then it’s worth it to figure out all the dependencies and write the build description. Otherwise a shell script is a perfectly good way to do things, for surprisingly large projects.

                      The Unix style is to loosely coordinate processes with shallow dependency trees. (as opposed to say npm which encourages deep transitive dependencies)

                      Shallow dependency trees <=> fast to build even if it’s not incremental

                      1. 1

                        This seems to be trying to argue both sides: Ninja is a good build system, and “surprisingly large projects” don’t need it.

                        Otherwise a shell script is a perfectly good way to do things, for surprisingly large projects.

                        At the risk of repeating myself, it is a perfectly good way to encourage bad design. Ninja may provide a perfectly good solution, but shell scripts don’t, unless you want the shell script to evaluate which targets to build based on which changes have occurred. That can be written in a shell script, but it begs the question, “why?” Failing to evaluate which targets to build means that the fastest route for a developer to add code is into an existing module, which encourages very large modules.

                        1. 1

                          I’m arguing specifically against GNU make, if it wasn’t clear from the original post. Either a shell script or Ninja is better, depending on the situation.

                          As a specific example, I compiled the core of CPython using a shell script and it takes around 30 seconds on a single core, for a debug build. This slice of CPython is 250K+ lines of code.

                          So I’m saying you can easily build projects that are 10K+ lines with a shell script and it saves you writing dependencies.

                          Dependencies are often wrong, including in CPython. Show me a big Makefile (or even a big Ninja file) and I claim it will be pretty easy to find dependencies that are wrong. In other words, make clean is an anti-pattern and still status quo.

                          Also, neither C or C++ are really modular languages, in the sense that they encourage small modules. Good C/C++ programmers often write monoliths – look at Fabrice Bellard’s code like QEMU or QuickJS, John Carmack’s style (who has argued against small functions), sqlite, Lua, the Zig compiler, etc.

                          Making a point of modularity in C and C++ is often counterproductive. Though I like modularity which is why I minimize writing C/C++ from scratch.

                          Also, when you have overly small modules, the time to check if it’s out of date often exceeds the time to just build it from scratch. build metadata can get very large, and build systems often have non-optimal algorithms (see GNU make’s builtin rules database)

                  2. 3

                    Perhaps this is one a decent argument for using something like scons for building a project if you have a team of programmers and a CI build system: if you switch your rebuild requirement from timestamps to a content hash (even one not cryptographically secure, since you’re presumably not defending against team-mates finding content collisions in source files to mess with you … usually) then you can have your project state file live in a cache which can be remounted into build containers often, so that the mainline trunk of development stays as the baseline and branch builds just compile whatever’s different from current mainline. This also adds an incentive to rebase fairly often, to keep compile times low.

                    1. 3

                      if you switch your rebuild requirement from timestamps to a content hash (even one not cryptographically secure, since you’re presumably not defending against team-mates finding content collisions in source files to mess with you … usually)

                      Note that Blake2b is faster than MD5, and Blake3 is potentially even faster. In practice, there is no practical speed difference between a cryptographically secure hash and a mere CRC: the bottleneck is going to be reading to disk anyway (well, except maybe an M2 drive).

                      1. 2

                        Right. I don’t think I made any claim about speed of hashes, only speed of builds and how to identify artifacts. Any mainstream hash performance is going to be negligible here.

                        My point was that the fact that scons uses MD5 is not a blocker, the hash algorithm still works for build artifact caching … in the direction of “main trunk” -> “dev branch”, at least.

                        Looks like scons now supports switching the hash algorithm, with code merged in 2020. After the next round of LTS OS releases are out, you might be able to start relying upon that. :D

                    2. 3

                      I think the greatest value of Make is its paradigm: How many other programming languages can you call implicitly parallel and declaratively dependency driven? I sure would like to do regular program logic like that!

                      While Make serves well to define what a build system is and does, I wouldn’t call it one: It’s a progamming language. More specialized tools like CMake and Meson have surpassed it in its main niche (as a C and C++ build system). But when you need a custom rule, Make is both easier to get right, and reads like an open book afterwards, compared to CMake, where you have to fine-read the documentation for add_custom_command and add_custom_target to even start to think about whether it is correct.

                      At the same time, Make is a terribly dirty stringly typed two-stage expanded macro-ish language that can’t even handle spaces and colons in filenames. Every line is also implicitly a shellscript. It sorely lacks a proper list datatype, since (unless you want to write everything twice) lists are what you have to express the act of depending on all your sources (or object files), so they all get built.

                      1. 3

                        This is a good description of the bare basics of a build system. Where things get messy, though — even in simple projects — is when the source files have dependencies on each other, which are described within those files. In C terms, when a .c or .h file #includes .h files. Then changing a .h file requires recompiling all the .c files transitively dependent upon it.

                        No problem, make can do that! Except (unless make has changed a lot since I last used it) those dependencies have to be described explicitly in the makefile. Now you’ve got a very nasty case of repeating yourself: it’s so easy and common to add an #include to a source file during development. But if you ever forget to add the equivalent dependency to the makefile, you’ve broken your build. And it can break in really nefarious ways that only manifest as runtime errors or crashes that are extremely hard to debug. This in turn leads to voodoo behaviors like “I dunno why it crashed, lets delete all the .o files and build from scratch and hope it goes away.”

                        So now you need a tool that scans your source files and discovers dependencies and updates your makefile. This is why CMake exists, basically. But it add more complexity. This is a big part of why C/C++ are such a mess.

                        (Or you could just use an IDE, of course. Frankly the nay reason I have to deal with crap like makefiles is because not everyone who uses my code has Xcode…)

                        1. 6

                          None of this is necessary. It’s perfectly normal in make-based C/C++ projects to have a build rule which uses the compiler to generate the dependencies during the first build & then include those build dependencies into the Makefile for subsequent incremental builds.

                          There’s no need to keep track of the dependencies for C/C++ files by hand.

                          (For reasons which are not entirely clear to me, Google’s Bazel does not appear to do this. Meson does though, if you want a nice modern build tool.)

                          1. 2

                            I imagine the reason is that Bazel requires a static dependency graph, including for all autogenerated intermediate files. I’m not sure why the graph is encoded directly in files instead of maintained in a parallel index though.

                            There’s internal tooling at Google to automatically update dependencies in BUILD files from source files, but it’s apparently not open sourced.

                            1. 2

                              Maybe recursive make is where it breaks down. I have fond memories of hardcoding dependencies between libraries in the top level makefile – an activity reserved for special occations when someone had tracked down an obscure stale rebuild issue.

                              (I think recursive make, at least done the obvious top-down way, is flawed.)

                              1. 1

                                Yeah, you never want to be calling make from within make.

                            2. 4

                              You can’t add dependencies on the fly in Make, unfortunately. You can get a list of dependencies of a file in Makefile format in with gcc using -MD and -MF, but that complicates things a lot. Ninja on the other hand has native support for these rules, but from what I’ve heard Ninja is mostly made to be used by higher-level build tools rather than directly. (I mean you can manually write your ninja file and use ninja just like that, but it’s not as pleasant to write and read as Makefiles.)

                              1. 5

                                from what I’ve heard Ninja is mostly made to be used by higher-level build tools rather than directly. (I mean you can manually write your ninja file and use ninja just like that, but it’s not as pleasant to write and read as Makefiles.)

                                That’s an explicit design goal of Ninja. Make is not a good language to write by hand, but it’s just good enough that people do it. Ninja follows the UNIX philosophy. It does one thing: it checks dependencies and runs commands very, very quickly. It is intended to be the target for higher-level languages and by removing the requirement from the high-level languages that they have to be able to run the build quickly, you can more easily optimise them for usability.

                                Unfortunately, the best tool for generating Ninja files is CMake, whose main selling point is that it’s not as bad as autoconf. It’s still a string-based macro processor pretending to be a programming language though. I keep wishing someone would pick up Jon Anderson’s Fabriquer (a strongly typed language where actions, files and lists are first-class types, with a module system for composition, intended for generating Ninja files) and finish it.

                                1. 1

                                  CMake, whose main selling point is that it’s not as bad as autoconf. It’s still a string-based macro processor pretending to be a programming language though.

                                  It’s kind of amazing how wretched a programming language someone can create, when they don’t realize ahead of time that they’re creating a programming language. “It’s just a {configuration file / build system / Personal Home Page templater}” … and then a few revisions later it’s metastasized into a Turing-complete Frankenstein. Sigh. CMake would be so much better if it were, say, a Python package instead of a language.

                                  I recall Ierusalemchy saying that Lua was created in part to counter this, with a syntax simple enough to use for a static config file, but able to contain logic using a syntax that was well thought-out in advance.

                                  1. 1

                                    The best tool for generating Ninja files is Meson :P

                                    Admittedly not the most flexible one, if you have very fancy auto-generators and other very unusual parts of the build you might struggle to integrate them, but for any typical unixy project Meson is an absolute no-brainer. It’s the new de-facto standard among all the unix desktop infrastructure at least.

                                    1. 1

                                      I’ve not used Meson, but it appears to have a dependency on Python, which is a deal breaker for me in a build system.

                                  2. 5

                                    You can’t add dependencies on the fly in Make, unfortunately.

                                    The usual way to handle this is to write the Makefile to -include $DEPFILES or something similar, and generate all of the dependency make fragments (stored in DEPFILES, of course) with the -MMD/-MF commands on the initial compile.

                                    1. 2

                                      You can definitely do this, here’s an excerpt from one of my makefiles:

                                      build/%.o: src/%.c | $(BUILDDIR)
                                      	$(CC) -o "$@" -c "$<" $(CFLAGS) -MMD -MP -MF $(@:%.o=%.d)
                                      
                                      -include $(OBJFILES:%.o=%.d)
                                      

                                      Not the most optimal solution, but it definitely works! Just need to ensure you output to the right file, wouldn’t call it particularly complicated, it’s a two line change.

                                      1. 2

                                        You didn’t mention the reason this truly works, which is that if there is a rule for a file the Makefile includes, Make is clever enough to check the dependencies for that rule and rebuild the file as needed before including it! That means your dynamically generated dependencies are always up to date – you don’t have a two-step process of running Make to update the generated dependencies and then re-running it to build the project, you can just run it to build the project and Make will perform both steps if both are needed.

                                  3. 1

                                    Why ‘the obvious’ might not be so

                                    It might seem amazing to us here, but maybe a lot of devs don’t actually know about make and dependency rules for builds?