1. 19
  1.  

  2. 9

    The most immediate takeaway for me: Rust compiles around 350 lines a second, Go around 44,000.

    (The only compilers faster in TFA are FreePascal, which is about as fast as Go, and TCC, which is crazy fast, something like 3x faster than Go’s compiler.)

    Now, code is run many orders of magnitude more times than it is compiled, so compilation time isn’t as important as runtime performance, but it does give an idea as to how pleasant developing code in a given language will be if you’re like me and have a very tight edit-compile-test cycle.

    1. 5

      Now, code is run many orders of magnitude more times than it is compiled, so compilation time isn’t as important as runtime performance, but it does give an idea as to how pleasant developing code in a given language will be if you’re like me and have a very tight edit-compile-test cycle.

      If you think in terms of wall clock time, slow compilation is a more serious problem. When developing, instant response gives an entirely different workflow compared to even just a few seconds’ wait. If it’s dozens of seconds, you seriously risk losing attention and focus every time you rebuild.

      1. 4

        It’s worth noting that FPC’s optimizer is quite good, whereas tcc does only the bare minimum of optimizations, which makes the fact that FPC is “only” one third the speed of tcc really impressive (IMVHO).[1] That also makes Go’s speed really interesting to me: I don’t know how you’d meaningfully compare the optimization skill of one compiler that’s has manual memory management only (or, I guess, reference counting, if we want to count TComponent descendents) v. one with a GC, but it’s interesting nonetheless.

        [1]: To be clear: this isn’t a dig at tcc; its entire point is to compile as fast as possible, to the point that you can use C as a scripting language, and it thus makes a deliberate decision not to have complex optimizations

        1. 4

          Given that “more important” means that there’s an “exchange rate” between the two (because obviously no one will use a language with 10% faster runtime but 100x slower development time), it’s fun to speculate about what exchange rates different people have. For instance, as a Common Lisp programmer, I’m taking a 2x performance hit in exchange for a 10x-100x faster edit-compile-test cycle over Rust - that’s my “exchange rate” (or, my exchange rate is at least that much).

          Meanwhile, Rust programmers’ exchange rates are probably much higher - they’d want something like 2x:10000x in order to switch.

          …of course, the above only applies given “all else considered equal” - which is never is. Programming language design is hard.

          1. 3

            rustc is also multithreaded. Not sure about go but C/C++ compilers aren’t, so the gap is even bigger.

            so compilation time isn’t as important as runtime performance

            Tbh I don’t think this is true, for two reasons:

            1. compilers can’t optimise code nearly as well as a person can, and when optimising by hand it’s nice to be able to run a lot of experiments
            2. most code is not performance sensitive to the point where it can be 10000x slower than optimal and nobody will notice, Or they will notice and it doesn’t matter because the product is good anyway. Or they will notice and it doesn’t matter because your company has 9+ digits of investor cash. etc
            1. 5

              rustc is also multithreaded. Not sure about go but C/C++ compilers aren’t, so the gap is even bigger.

              That’s not super true I believe. rustc front-end is not parallel. There were “parallel compiler” efforts couple of years ago, but they are stalled. What is parallel is LLVM-side code generation — rustc can split llvm ir for a crate into several chunks (codegen units) and let LLVM compile them in parallel. On a higher level, C++ builds tend to exhibits better parallelism than Rust builds: because of header files, C++ compilation is embarrassingly parallel, while Rust compilation is shaped as DAG (although the resent pipelines build helped with shortening the critical path significantly). This particular benchmark sets -j 1.

              1. 2

                Yeah, but compiling C/C++ is totally parallelizable per source file. Any nontrivial C/C++ build tool runs N parallel “cc” processes, where N is more or less the number of CPU cores. In practice, Xcode or CMake manage to peg my CPU at 100% right up until link time, which is unfortunately single-threaded.

                1. 1

                  I THINK I told rustc to only use a single thread. The commands are there for someone to double-check.

                2. 1

                  In their defence, Rust is driven by no-collector correctness, and Golang was built from the ground up with compilation speed as the important driver, because Google has fucktons of code they build every few hours.

                3. 4

                  I must say, I find it incredibly cute that gcc and clang are both fastest when built by themselves.

                  1. 4

                    It’s not an accident. Clang developers care about how fast clang runs. This feedback cycle is why clang used to be a lot faster than GCC both for compiling C++ code and in terms of the performance of the output. Most GCC developers (especially most GCC optimisation developers) weren’t affected by the performance of GCC on C++, all LLVM developers were. The advantage of writing your compiler in your language is that you set up incentives to make that language fast. The down side is that you can over-fit your optimisations for things that a compiler benefits from but that no other kind of program does.

                  2. 4

                    Note: the native ocaml compiler is ocamlopt, not ocamlc. ocamlopt should be more straightforward to benchmark and is the compiler that most people use in practice.

                    1. 4

                      So, I’m going to look at the compilation speed of different languages by benchmarking how fast various compilers compile themselves.

                      That makes no sense. Compilers differ vastly in complexity, so they aren’t comparable codebases. Code optimization throws a wrench in by slowing the compiler’s build time but speeding up runtime. And since almost no one builds compilers from source, I’d imagine not a lot of effort goes into improving their clean-build times (e.g. removing unnecessary includes, in a C-based compiler.)

                      I don’t think you can call rustc self-compiling yet anyway; isn’t it still based on LLVM?

                      1. 3

                        Being based on LLVM doesn’t mean it is not self-hosted.

                      2. 4

                        I bet there are a ton of implementations of “ray tracing in one weekend” in Rust&C++ on GitHub. This seems like an interesting corpus for figuring out compilation speed.

                        1. 3

                          I wonder how much #include affects C compile times…

                          1. 2

                            C, not to bad… C++ it can be painful, especially with #include <regex> and others.

                            1. 5

                              It used to be a lot with C, which is why older codebases such as FreeBSD try very hard to minimise the number of inclusions and why there isn’t a #include <cstd.h> or #include <posix.h> that just includes everything. There are several overheads:

                              • The cost of finding the file in the filesystem and reading it. With a modern disk and a buffer cache, this is in the noise. With old (’80s - ’90s) machines where this involved 20-30ms of latency for each file, which with large numbers of includes quickly added up to seconds per compilation unit and minutes across the build.
                              • The cost of tokenising the stream and memory overheads for the AST. This was surprisingly slow with GCC, clang was many times faster for this. It was important for things like ccache and distcc, which preprocessed locally and then either compared the previous version or shipped across the network for codegen. On OS X, last time I looked, #include <Cocoa/Cocoa.h> run through the preprocessor expanded to about 8 MiB of text. Symbol tables for something like this can get pretty big and may result in more cache misses in symbol resolution for your real file.

                              In most cases, this is now such a small part of the compile step (especially in an optimised build) that it’s pretty negligible. On Windows, MSVC defaults to providing a precompiled header for the big Windows include file so that it generates the huge symbol table once and you just search that. This is much faster than having every compile job parse a subset of the files.

                              With C++, there’s the additional cost that you have to generate template instantiations that are used and with both C/C++ you have to generate IR for every inline (C++) or static inline (C) function in the header. Template instantiation typically dominates C++ builds and is particularly annoying because you commonly get a few template instantiations generated for almost every compilation unit (which adds overhead in the front end), surviving through optimisation (which adds overhead in the middle) and then discarded in the linker (which adds overhead at the back). Some of the work related to C++ modules is trying to eliminate this by having a single canonical location for each template instantiation that can be made available for inlining but where there’s only ever one non-inline definition generated.

                              1. 2

                                Precompiled headers used to be a huge win in C++ builds. Now they don’t seem to make any difference at all. I’ve tried turning them back on in Xcode in my current medium-sized project and there was no noticeable speed up. Maybe that compiler flag doesn’t do anything anymore.

                                The recent trend towards header-only C++ libraries has been terrible for build times, too.

                                1. 2

                                  Interestingly, OCaml has had the equivalent of precompiled headers (.cmi or ‘compiled interface’ files) from day one, and they hugely boost separate compilation and build speed. It really works well if done right.

                          2. 3

                            Chez scheme would probably win among the optimizing compilers in that list. It takes less than 5 minutes to build itself 2-3 times with optimization enabled.

                            Last time I looked go barely does any optimizations at all! As for pascal, I recall reading that the language was designed to be compilable in a single pass through the source in order!

                            1. 4

                              Niklaus Wirth, notably, would not accept additions to the compiler that didn’t actually speed up the compilation of the compiler itself.

                              1. 1

                                The same story holds I believe for the chez scheme compiler. At least so I’ve heard

                              2. 3

                                Go now has an SSA intermediate form and does quite a lot more optimisation.

                              3. 2

                                I was hoping to see compilation time for Scala. Oh well ¯_(ツ)_/¯

                                1. 1

                                  What I don’t get is why this needs to be tested on projects which do similar things, like … compilers. Compile speed isn’t a function of what type of project is being compiled. Just pick any reasonable projects in the languages being tested and check their speed (loc/s or whatever).

                                  1. 4

                                    Compile speed isn’t a function of what type of project is being compiled

                                    It absolutely is! You can design your application such that it has fairly few interdependencies, and can effectively be compiled in parallel without much duplicate processing. Or you can design your application such that most lines of code go through semantic analysis multiple times for every translation unit, and a change in a single file means everything has to be recompiled.

                                    (That being said, I don’t think the problem domain has a huge impact on this; I expect you can easily make both compilers that are compiled quickly and compilers that are compiled slowly.)

                                    1. 2

                                      This is especially important when testing C++ compile times. C++ can literally have quadratic compile times, if you’re not careful and you end up with a project where most code is in headers and most headers include most other headers. Or it can be extremely fast to compile if you’re careful and use various idioms (such as PIMPL and forward declarations, and avoid template metaprogramming) to keep headers small and isolated.

                                      1. 1

                                        C++ can literally have quadratic compile times

                                        Quadratic in terms of what?

                                        C++ has three Turing-complete compile-time languages (C preprocessor[1], templates, constexpr), though practically compilers guarantee termination by placing hard limits on recursion depth in each of these. With constexpr, writing a compile-time ray-tracer isn’t even a challenge anymore, so you can express pretty much any complexity class of algorithm relative to the size of the compiler input. I’ve written a single (fairly short) C++ compilation unit that took over a minute to compile on a fast machine, though I wouldn’t recommend doing so.

                                        [1] A preprocessor input can set macros and then recursively include itself and set another macro depending on the values of the previous ones.

                                        1. 3

                                          I mean quadratic in terms of the size of the project.

                                          Consider a normal program (so not a compile-time ray tracer, but, say, a web browser) with the unfortunate properties that A) most code is in headers (which modern C++ kind of encourages), B) most header files have associated source files, and C) your structure is a bit sloppy so most header files end up including most other header files. Your compile times become basically proportional to (number of source files) * (number of header files), since every source file includes basically all headers. Since your source and header files generally come in pairs, that becomes (number of source files) * (number of source files). And you’re probably going to be introducing new source files as the project grows, with an average ranging from a few hundred to a few thousand lines per file.

                                          You have a project whose compile times grow quadratically in terms of the number of lines of code.

                                          Obviously most programs aren’t going to be that bad, but you have to be extremely careful to keep your compile times close to linear in terms of project size.

                                          1. 1

                                            You’re correct from a pure complexity theory perspective, but the constants make a huge difference. For C++ the cost of #includeing a file if you don’t refer to any of the declarations in it is pretty negligible, especially for template functions, which aren’t even type checked unless they’re instantiated. Adding one extra header that every file includes but doesn’t use is not going to have a big effect on your build times. If you’re using precompiled headers, it will be approximately zero (though incremental builds will still suffer from recompiling things that haven’t actually changed). If; however, you add a template with a data-dependent value, or a complex constexpr function, that is used across multiple compilation units then you’ll see a noticeable slowdown.

                                            1. 1

                                              The issue is C++ exhibits physical design in addition to logical design.

                                              Adding one extra header that every file includes but doesn’t use is not going to have a big effect on your build times

                                              I’ve definitely seem order of magnitude build speed increases by eliminating ball of mud headers, especially those in STL. It’s baffling to me why so many projects “don’t have time” to set up IWYU.

                                              If; however, you add a template with a data-dependent value

                                              Most non-trivial code I’ve seen depends on templates, either for containers or hiding raw pointers. The current state of things would also be helped if people would write code without enable_if all over the place and people would crack open a copy of “Large Scale C++ Software Design” by Lakos.

                                      2. 1

                                        But we’re not talking about incremental compilation here, we’re talking about a clean build.

                                        And if we want to iron out the confounding factors, it seems to me a good way to do that would be to take a representative sample of real-world(-ish) projects from each language and observe their CI build times. Many projects have publicly available CI build logs so this should actually be pretty easy to get.

                                      3. 1

                                        Yes it is, I checked.

                                        1. 1

                                          I may be missing something, but that link seems to be an analysis of time spent in Rust compilation phases, not compile speed (loc/s) analysis across different projects.

                                          1. 1

                                            There’s three different projects that were profiled, which each had very different amounts of time spent in different compilation phases.

                                            1. 1

                                              But there was no analysis of compilation speed per loc. So we can’t really claim that different projects have different compile speeds per language.