1. 18

  2. 11

    Modern production compilers are smart enough to recognize classic include guards the first time they read a header and then skip reprocessing that header (not even bothering to open it) when it is included again:

    % cat > test.hpp <<EOF
    #ifndef TEST_HPP
    #define TEST_HPP
    class foo {};
    % cat > test.cpp <<EOF
    #include "test.hpp"
    #include "test.hpp"
    #include "test.hpp"
    #include "test.hpp"
    #include "test.hpp"
    int main() {}
    % strace -f g++ test.cpp |& grep test.hpp
    [pid 16400] read(3, "#include \"test.hpp\"\n#include \"te"..., 114) = 114
    [pid 16400] stat("test.hpp.gch", 0x7fff00a4beb0) = -1 ENOENT (No such file or directory)
    [pid 16400] openat(AT_FDCWD, "test.hpp", O_RDONLY|O_NOCTTY) = 4
    % strace -f clang++ test.cpp |& grep test.hpp
    [pid 16410] pread64(3, "#include \"test.hpp\"\n#include \"te"..., 114, 0) = 114
    [pid 16410] openat(AT_FDCWD, "./test.hpp", O_RDONLY|O_CLOEXEC) = 3
    [pid 16410] readlink("/proc/self/fd/3", "/home/aek/temp/test.hpp", 4096) = 23
    1. 8

      Is #pragma once not enough? There are no numbers to show how much faster it is, to decide if it’s worth the price.

      1. 4

        I think the justification is thoroughly out of date:


        There’s a little dance involving #ifdef’s that can prevent a file being read twice, but it’s usually done wrong in practice - the #ifdef’s are in the file itself, not the file that includes it. The result is often thousands of needless lines of code passing through the lexical analyzer, which is (in good compilers) the most expensive phase.

        Modern preprocessors are extremely fast and optimized (e.g. with respect to string allocation)

        And even if they weren’t, neither pre-processing or lexing would dominate even a debug build (i.e. where the code gen phase is as fast as possible).

        Again I agree with another commenter that says this post lacks numbers.

        I just did a little test of #include <vector> duplicated in multiple headers. Then I use cc -E .. | wc -l to count lines.

        It doesn’t get duplicated, almost certainly because #pragma once works.

        And also I did a test of the traditional include guards in the header, and it works fine. The lines do NOT get duplicated and do NOT get passed to the lexer. The file gets opened and preprocessed, so you can technically save that. But again I’d like to see numbers.

        It’s weird to call it “a little dance involving #ifdefs” when I haven’t seen a codebase in 20 years that doesn’t do that, or something more modern. A lot has changed since this article was written.

        1. 3

          This sounds like a thing which might be more convenient with some tooling support. Like you have a (partial) ordering over all .h files, the IDE knows about it, and if you type var foo = std::make_unique<Bar>() then the IDE automatically inserts #imports for <bar.h> and also all the headers that <bar.h> depends on, in the right order so that everything works out.

          …at which point you’ve invented like half of a proper import system, but oh well.

          1. 4

            …at which point you’ve invented like half of a proper import system, but oh well.

            Maybe? Proper import system I think is an unsolved problem for languages with C++/Rust style of monomorphisation. Semantics-wise, Rust crate/module system is great (that’s my favorite feature apart from unsafe). But in terms of physical architecture (what the article talks about) it’s not so great.

            • There’s nothing analogous to pimpl/forward declaration, which significantly hamstrings separate compilation. C++ is better here.
            • Although parsing and typechecking of templates happens once, monomorphisation is repeated for every compilation unit, which bloats compile time and binary size in a big way.
            1. 1

              analogous to pimpl/forward declaration

              Box<>‘d opaque types? I’ve seen multiple blog posts mentioning using this for mitigating dependency chains.

              Although parsing and…

              I miss the SPECIALIZE pragma from GHC Haskell. Your generic functions get a slow fully polymorphic version generated (with an implicitly passed around dictionary object holding typeclass method pointers) and then you could easily write out a list of SPECIALIZE pragmas and to generate monomorphic copies for specific types you really care about the performance on.

              This feels like it ought to be possible in principle to deduplicate monomorphisations happening in different compilation units with a mutex and a big hash table.

              1. 1

                Box<>‘d opaque types? I’ve seen multiple blog posts mentioning using this for mitigating dependency chains.

                I don’t believe there’s a functional analogue to pimpl in Rust, but I need to see a specific example to argue why it isn’t.

                What you could do in Rust is introducing dynamic dispatch, but it has significantly different semantics, is rather heavy weight syntactically (requires introducing single-implementation interfaces and a separate crate), and only marginally improves compilation time (the CU which “ties the knot” would still needs to be recompiled. And you generally want to tie the knot for tests).

            2. 2

              Tooling increasingly supports modules, which require you to do the opposite thing: have a single header for each library, parse it once, serialise the AST, and lazily load the small subset that you need. This composes with additional tooling such as Sony’s ‘compilation database’ work that caches template instantiations and even IR for individual snippets.

              The approach advocated in this article imposes a much larger burden on the programmer and makes it very hard for tooling to improve the situation.

              1. 2

                This reminds me a lot of Robert Dewar’s paper on the GNAT compilation model, https://dl.acm.org/doi/abs/10.1145/197694.197708

                He ditched the traditional Ada library database, and instead implemented Ada’s with dependency clauses in a similar manner to C #include, which made the compiler both simpler and faster.

                1. 1

                  Interesting, thanks. I am vastly out of touch with what’s happened in C++ since 1998.

                  1. 1

                    In 2004 the approach advocated in the article paid off. And the larger burden was not quite enough of an ongoing thing to really hurt.

                    Modules would be much nicer if the ecosystem support is there. (I’m kind of thankful not to need to know whether it is… I spend a lot less time with my C++ tooling in 2022 than I did in 2004.)

                    And this:

                    additional tooling such as Sony’s ‘compilation database’ work that caches template instantiations

                    sounds like the stuff dreams are made of.

                2. 3

                  You then #include “foo2.h in foo.c and bang! You just included and parsed bar.h twice.

                  This is what #pragma once is supposed to prevent.

                  Old style header guards were a problem since the compiler still needed to read the entirety to look for the tail #endif. It had been such a problem that Lakos in Large Scale C++ Software Design recommended and showed how using redundant header guards improved compilation speed by not opening the file:

                  // foo.h
                  #ifndef FOO_INCLUDED
                  #define FOO_INCLUDED
                  //.... contents ...
                  #endif // FOO_INCLUDED
                  // usage of foo.h
                  #ifndef FOO_INCLUDED
                  #include "foo.h"

                  The more modern solutions seem roughly in order to be:

                  1. Use #pragma once
                  2. Use forward declarations
                  3. Use precompiled headers
                  4. Use Include what you use to track and minimize includes
                  5. PIMPL pattern
                  6. (Only last since requires C++20) Modules
                  1. 2

                    Yeah I would also say the statement is slightly inaccurate. You included it twice which you means you could have pre-processed it twice, but the compiler didn’t PARSE it twice, as long as you have the traditional include guards.

                    I agree with another comment in that I’d like to see some data.

                    I’m working on optimizing the build of oil-native now, moving to Ninja … so if anyone has tips let me know.

                    I think I may count the lines of every pre-processor-expanded translation unit, which is something obvious I’ve never done …

                  2. 2

                    C++20 introduces modules, which I think are intended to make all this busywork unnecessary. I don’t know for sure, because I won’t try C++20 until it becomes the default language in clang and gcc. But I think I’d rather convert a C++ project to modules, than convert a C++ project to “extreme #include discipline”.

                    1. 2

                      I presided over an effort to do this for a large C++ codebase once, in the mid ‘00s. We PIMPL’d everything too.

                      Our one exception to the rule was a header that was nothing but a set of macros which were used to forward declare smart pointers and linked lists.

                      It cost us a full time intern for 3 months plus about 1/4 of a senior developer over that timeframe to keep it on the rails. It was really worth it. Once we were done, we were able to add Linux, OS X, FreeBSD and Solaris support to a stack that had previously been Windows-only. It was relatively easy to maintain the discipline once we got everything converted.

                      The other thing I really wanted but never got for that codebase was pervasive use of precompiled headers. We only ever managed that on Windows. That would have been a tremendous reduction in compile time. I had to maintain gcc 2.95 support for way too long.

                      1. 1

                        I’m about to suggest a technique for a language I’ve never used in anger, so the idea has a good likelihood of being rubbish, but…

                        My first thought here goes along the lines of using guard macros with an inverted purpose: so foo.h can check whether the bar.h guard macro is defined and throw a meaningful error otherwise (à la “foo.h depends on bar.h, please include it”), in place of foo.h just including bar.h itself. (And bar.h itself should throw an error if it finds its guard macro is defined, in place of skipping itself.)

                        This way, instead of reverse-engineering the dependencies from the compiler’s complaints, you’d be able to just read off the errors. There would still be some tedium but it would be limited to having to bounce on the compile key and adding another include a couple of times before the code actually compiles.

                        I wonder if that would work that well in practice. Or if I’m proposing an ol’ chestnut.

                        1. 1

                          As a rule of thumb I #include only a bare minimum in headers; if class/struct declaration depends on some other structures as plain data I obviously have to add more headers there, but in most cases those structures are used just as pointers/references so forward declaration is the way to go.

                          That also helps to untangle cyclic dependencies.