I never understood the people who build a reduced set of back ends for clang. The extra back ends are such a tiny portion of the total code that it makes almost no difference to compile times or to final binary size and it dramatically reduces the utility of the final binary.
Last time I measured build time difference between a build with all backends and a build with only the X86 backend, the difference was rather significant.
All backends:
time ninja
[4182/4182] Generating ../../bin/llvm-readelf
real 10m12.710s
user 451m12.444s
sys 12m12.634s
X86 backend only:
time ninja
[3196/3196] Generating ../../bin/llvm-readelf
real 7m55.531s
user 344m56.462s
sys 8m53.970s
This was about one year ago, on Linux. More details here.
That’s not quite a fair comparison though, because now you can’t cross compile. If you need a cross-compiler for any architecture then you’re looking at another 8 minutes to build a version of LLVM that can target that one. Maybe I’m an outlier, but once I got used to a compiler that could target any architecture, I have found it incredibly limiting to go back to one that can’t. In particular, if I’m writing something performance critical, I often want to look at the generated assembly on 3-4 architectures to make sure that I’m not backing in any ISA-specific or ABI-specific assumptions.
The last line of your ninja output also reminds me that it’s not just clang, all of the other utilities use the back ends. Your first one, for example, generates an objdump that can disassemble only x86 binaries. If you want to look at an Arm binary, you need to install a separate version of objdump. Again, that’s something I find incredibly annoying - I can’t just run objdump, I need to remember to invoke the specific one for the target that I’m disassembling.
Did you mean that it doesn’t make a difference to the compile time of llvm/clang or when using clang (with all backends vs single backend) to compile other software?
It doesn’t make a significant difference to the compile time of llvm/clang, relative to the value of the features that you get in exchange. You can get a 100% reduction in the compile time of LLVM by not building any of it and then the built clang is useful for 0% of use cases. You get a 20% (actually surprised by that, it was much less last time I tried) reduction in LLVM’s build time by removing support for all cross-compile scenarios and any support for the other tools to interact with binaries for any other architectures (so no objdump, no readelf, and so on, for any binary that isn’t native to the host architecture). I consider that to be significantly more than a 20% reduction in features. Maybe I’m an outlier here.
Some platforms don’t have powerful build systems so what’s a small difference in compile time on a cpu-rich amd64 box may be a big difference on macppc or octeon, for example. How much utility is there in having an x86-64 backend available on an octeon machine?
Do you ever want to inspect an x86 binary on the octeon? If so, having an objdump that works there is useful. That’s a use case I hit pretty often.
If the machine is very slow, then I’d generally cross build from another machine. That means:
I benefit from the faster machine having a cross compiler pre-installed.
I get a much bigger speedup for the LLVM build by compiling all back ends for the target by not compiling lld, or clang, since I’m not going to use them on the slow system.
I never understood the people who build a reduced set of back ends for clang. The extra back ends are such a tiny portion of the total code that it makes almost no difference to compile times or to final binary size and it dramatically reduces the utility of the final binary.
Last time I measured build time difference between a build with all backends and a build with only the X86 backend, the difference was rather significant.
All backends:
X86 backend only:
This was about one year ago, on Linux. More details here.
That’s not quite a fair comparison though, because now you can’t cross compile. If you need a cross-compiler for any architecture then you’re looking at another 8 minutes to build a version of LLVM that can target that one. Maybe I’m an outlier, but once I got used to a compiler that could target any architecture, I have found it incredibly limiting to go back to one that can’t. In particular, if I’m writing something performance critical, I often want to look at the generated assembly on 3-4 architectures to make sure that I’m not backing in any ISA-specific or ABI-specific assumptions.
The last line of your
ninja
output also reminds me that it’s not just clang, all of the other utilities use the back ends. Your first one, for example, generates anobjdump
that can disassemble only x86 binaries. If you want to look at an Arm binary, you need to install a separate version ofobjdump
. Again, that’s something I find incredibly annoying - I can’t just runobjdump
, I need to remember to invoke the specific one for the target that I’m disassembling.Did you mean that it doesn’t make a difference to the compile time of llvm/clang or when using clang (with all backends vs single backend) to compile other software?
It doesn’t make a significant difference to the compile time of llvm/clang, relative to the value of the features that you get in exchange. You can get a 100% reduction in the compile time of LLVM by not building any of it and then the built clang is useful for 0% of use cases. You get a 20% (actually surprised by that, it was much less last time I tried) reduction in LLVM’s build time by removing support for all cross-compile scenarios and any support for the other tools to interact with binaries for any other architectures (so no objdump, no readelf, and so on, for any binary that isn’t native to the host architecture). I consider that to be significantly more than a 20% reduction in features. Maybe I’m an outlier here.
Some platforms don’t have powerful build systems so what’s a small difference in compile time on a cpu-rich amd64 box may be a big difference on macppc or octeon, for example. How much utility is there in having an x86-64 backend available on an octeon machine?
Do you ever want to inspect an x86 binary on the octeon? If so, having an
objdump
that works there is useful. That’s a use case I hit pretty often.If the machine is very slow, then I’d generally cross build from another machine. That means:
lld
, orclang
, since I’m not going to use them on the slow system.