1. 93
    1. 26

      I’m glad someone is taking a look at linkers, since compilers get most of the attention. Linking is a huge bottleneck, rarely parallelized, and massively complicated.

      I sometimes wonder is linking itself is even something worthwhile to have anymore, and that systems could switch to something like either merging compiling and linking into a single step, or having directly runnable objects that can be trivially fixed up.

      1. 37

        LLD is getting a lot of attention. It replaced the multi-pass algorithm that traditional UNIX linkers use with a single-pass one that is O(n) in terms of the number of symbols, rather than O(n*m) in terms of the number of symbols and number of object files. In exchange for this, it uses more memory, but the memory for the sum of all of the symbol tables in all object files in a large project is still pretty small compared to a modern machine’s RAM. It’s also aggressively parallelised. As the linked repo says, LLD takes 12 seconds to link Chromium, gold (which was a rewrite of BFD ld focused on speed) takes 50.

        It seems that the main speed win here is actually an accounting trick: it starts the linking process before compilation starts and so only counts the time between the last compile step finishing and the linker finishing. Because most of what the linker is doing does not depend on every object file, you can get most of the way through the link before you need all of the object files. This probably doesn’t get as much speedup with things like --gc-sections, which require having a complete symbol table so that you can do reachability analysis before you can do the final layout, but even then you can start copying things that are definitely reachable early, you just might end up with something in the final .o that requires you to pull in a load of things from other objects.

        Note that the author of this repo is one of the lead developers on LLD. Rui is awesome, after a 5 minute conversation with him I completely reevaluated how I thought about linkers. I hope that this is an experiment that he’s doing to prototype things that he’ll eventually add to LLD.

        As you your other suggestions:

        merging compiling and linking into a single step

        That’s what languages that do whole-program analysis (Go, Rust) do, and is also almost what LTO does. There’s a trade-off here in how much you can parallelise. If you completely combine final code-generation and linking then the compiler needs to know the locations of everything else in the object file. That’s not feasible because of circular dependencies and so you end up needing something like relocations and a final fixup step, which is doing part of what a linker does.

        The Go (Plan9) toolchain actually does this slightly differently to other things. It combines the assembler and linker so the compiler emits pseudo instructions for indirect references and the assembler / linker expands them into the shortest instruction sequence that can express the displacement that’s actually needed. This is a big win on a lot of architectures: if you can materialise a 16-bit constant in one instruction and a 32-bit one in two, your linker / assembler can do some constraint solving, try to place functions that call each other close together, and insert the single instruction in the common case and the two-instruction sequence for the places where it’s needed. Without this merging, you end up doing one of two things:

        • Use the short form and require the linker to insert thunks: jump with a short displacement and if the target is too far away, insert a trampoline somewhere close that has a three-instruction sequence, burning some branch predictor state and adding overhead when you need to do this.
        • Always emit the long sequence and often have a no-op instruction that writes 0 into a register that’s already 0.

        C++ places a lot more demands on the linker. The modern C++ compilation model[1] generates every used template instantiation in every compilation unit that uses it, puts them in COMDAT sections, and relies on the linker throwing them away. This means that the generated instantiations are available for analysis in every module, but it also means that the compiler does a lot of redundant work. Modules and ThinLTO are intended to address this: most template instantiations will have a single home but can be pulled into other compilation units for inlining if it would help.

        having directly runnable objects that can be trivially fixed up.

        We call this ‘dynamic linking’. It is slower, because there’s still a linker, it just happens every time you run the program. Back when I used BFD ld, I always build LLVM as shared libraries for my iterative compile cycle, because a debug build of clang took 5 minutes to link. I’d always do a statically linked build before I ran the test suite though, because the linking time (even with BFD ld) was less than the difference in time running the test suite (which invokes clang, opt, llc, and so on hundreds of times).

        There’s always a tradeoff here. Code that is faster to link is slower to run. You can make code fast to link by adding a small indirection layer, putting everything in a single big lookup table, not doing any inline fixups and making any access to a global symbol go via that table. You can make it fast to run by aggressively generating the shortest instruction sequence that materialises the exact address. Dynamic linking generally doesn’t want to modify executable code for two reasons: it’s bad for security to have W&X code and it prevents sharing[2]. This means that you end up with indirection layers. Look up ‘copy relocations’ some time for the extreme case of weirdness ere.

        [1] The original 4Front C++ compiler parsed linker error messages for missing symbols and generated the template instantiations once, on demand, in an iterative cycle.

        [2] 32-bit Windows DLLs actually did something differently here: DLLs were statically relocated and expected to run at a fixed address. There was a fun constraint solver that tried to find a place in the address space for all DLLs that works with all installed EXEs. If it failed, the DLL would be statically relocated on load and would not be shared.

        1. 6

          but the memory for the sum of all of the symbol tables in all object files in a large project is still pretty small compared to a modern machine’s RAM.

          … no? I frequently experience OOMs when linking big C++ projects. With medium-scale programs, linking has to run with just one process at a time to not get OOM killed (with my 16GB of RAM). With truly huge projects, such as Chromium (especially with debug symbols), I have to add immense amounts of SWAP space and just let the linker process thrash for a long time.

          In my experience, linker memory usage is the main limiting factor with big C++ projects, and it’s one of the main reasons I’m moving to at least 32GB of RAM the next time I upgrade anything.

          1. 1

            What linker are you using? I haven’t had that since the bad old days of BFD LD. A debug build of LLVM generates about 40GiB of object files, but symbol tables are a tiny fraction of that. The vast majority of it is… object code. That doesn’t have to be resident for lld to work. Last time I checked, lld would mmap all of the object files, but that doesn’t consume any RAM unless you have spare RAM: if RAM is constrained then the OS can just evict the object files from memory.

      2. 8

        having directly runnable objects that can be trivially fixed up

        How trivial can it get?

        The fundamental problem that linkers solve is that if you do incremental compilation, disparate modules need to be able to call into each other. ‘Linking’ just means resolving symbols into addresses; you need to do that anyway at some point. So I don’t disagree that it can get faster, but the essential complexity is still there.

        The zig compiler has been making some cool strides there, incidentally. It sidesteps the issue entirely by doing binary patching. Common lisp does something similar, with its ability to redefine symbols in a running image.

      3. 5

        …that systems could switch to something like either merging compiling and linking into a single step…

        That would likely require all the source and objects to be known at compile time, which probably means getting rid of dynamic libraries. A lot of the complications of linking comes from dynamic libs.

      4. 4

        The abstraction layer is already broken by LTO. Now the linker is an all-in-one compiler, too.

        1. 14

          In GCC, LTO is a “sort of” compiler. What happens is that GCC calls collect2 (its linker wrapper) that then calls ld with the plugin options to have it run lto-wrapper. lto-wrapper has all the objects taking part in the link. It then calls the compiler driver with special options to make it do whole program analysis. This phase then partitions the objects and runs GCC again on all those partitions, possibly using Make. Eventually all the objects that get created from this go back to the linker, which then links them.

          It’s the Unix philosophy at work. And it’s murder to debug.

          1. 9

            The unix philosophy is about composable, reusable tools. Without reuse in different contexts, it’s just an inconvenient split of functionality.

            How would you compose these tools differently to accomplish a different goal with them?

        2. 5

          This isn’t targeted at release builds, though, it’s targeted at development builds.

          I don’t think anyone is using lto for development builds, even if they use regular optimizations.

    2. 15

      I remember when gold was a HUGE improvement in link time over classic gnu binutils ld. Because I’m old.

    3. 10

      Linkers are long overdue for an upgrade. Not just speed leaves a lot to be desired, but the whole compiler-linker interaction is messy. Library include paths are disconnected from source includes and can get out of sync. Flags specified in a wrong order cause weird failures. The linker generally has no idea what the compiler was trying to do.

      When I have an error in the source code, compiler will bend backwards to help and suggest a solution and point it with squiggly underlines. Meanwhile linker will just print “missing _Zzxc329z9xc in asdf.o, bye”.

      1. 9

        It’s perhaps underappreciated how much the compiler driver (not the compiler proper) knows about calling the linker and how much it does for you. If you run gcc -dumpspecs you’ll likely see the *link_command directive at the end – and it’s probably an impenetrable mess. The compiler puts all the startup, system libraries, and finalization objects in the correct order (notwithstanding user libs) and also deals with the rather obscure options that are required depending on how you compiled the code.

        If you had to link things yourself, you’d quickly find it’s very difficult. It requires a fair bit of knowledge of the C library you’re using and the system itself. Compiler drivers handling this for you is very helpful, with the caveat that there ends up being a lot of abstraction making it difficult to decipher what is going on (cough collect2 cough).

        1. 7

          Yea, that part is annoying. When writing either makefiles or build systems, it would’ve been nice if you had one kind of rule to compile C files, one kind of rule to compile C++ files, one kind of rule to compile fortran files, and one kind of rule to link object files into an executable. But due to the immense mess of linker options, you can’t just do that. You want to link with the C compiler if your object files are only from C files, link with the C++ compiler if your object files are from C++ files or a mix of C and C++, link with the fortran compiler if your object files are from fortran, and I don’t even know how you would do it if your project has C, C++ and fortran files.

          It’s not nice.

      2. 8

        Library include paths are disconnected from source includes and can get out of sync

        MSVC and clang have pragmas that you can put in your header to control what is linked.

        Meanwhile linker will just print “missing _Zzxc329z9xc in asdf.o, bye”.

        Every linker I’ve used in the last decade will print an error with a demangled C++ name and tell me which function referenced the symbol that was not found:

        $ cat link.cc
        int x(int);
        int main()
        $ clang++ link.cc
        /usr/bin/ld: /tmp/link-ab4025.o: in function `main':
        link.cc:(.text+0xa): undefined reference to `x(int)'
        clang: error: linker command failed with exit code 1 (use -v to see invocation)
    4. 10

      Oooh this is the guy who made chibicc, I’ve been sporadically reading some of the code to learn a bit more about parsing/compiling, what a productive fella.

      1. 3

        I’m loving looking through their c compilers.