Nice article. It highlights the key point that I made in the MPhil compiler course I used to teach: JITing is easy, knowing when and what to JIT is hard.
A few nits:
Compiling out the machine code for all these cases is not very productive for a variety of reasons, which is what we’d have to do if we wanted Julia to be a compiled language
Not entirely true. In the absence of dynamic linking and function-pointer type casts, you can do reachability analysis and compile only the cases that are reached. Pony does this, for example.
If I “JIT compiled” C similarly to how Julia does it (statically compile each function as it’s called), it would be impossible to make it faster than compiled-C as the compile-time is non-negative and the generated machine code is essentially the same.
This is not really true. For example, a lot of dynamic behaviour in C comes from the dynamic linker. You can’t inline in AoT compiled C across a shared-library boundary but you could with JIT’d C. C programs often have a few functions that are always called with a handful of argument values. Optimising for the ones you see can also improve things (though you could do the same with profile-guided optimisation if you tracked values - most things just track branches). Additionally, if you’re JITing, you can target exactly the hardware that you’re running on: Do you have SSE4 as your baseline or AVX512, what is the relative cost of different vector operations? One of the presentations on the LLVM-based fourth-tier JIT for JavaScriptCore showed that a couple of benchmarks were faster when compiled from C to JavaScript and then JIT’d with JSC’s FTL than they were when compiled with clang, for precisely this reason (apples to apples comparison: both approaches used clang as the parser and LLVM as the back end).
If Pypy decides it needs to compile many things all at once after JITs compiling som functions, then you might have a slow-down in the middle. It also makes benchmark results more ambiguous, as you have to check if the jitted languages were given time to warmup, but you’d also want to know if it took an unseemly amount of time to warmup
Laurie Tratt has a nice paper about this. It’s actually worse than the author describes: JITs don’t always (or often) reach a steady state after warmup
Wow thanks for all the information (I’m author) in the first part that’s actually new to me! I actually have been bit by VMs giving me a squiggly line in warmup at peak performance and it was awful c: (I linked the Laurie-article you’re referencing in the post)
One of the presentations on the LLVM-based fourth-tier JIT for JavaScriptCore showed that a couple of benchmarks were faster when compiled from C to JavaScript and then JIT’d with JSC’s FTL than they were when compiled with clang, for precisely this reason (apples to apples comparison: both approaches used clang as the parser and LLVM as the back end).
That sounds super interesting. Do you happen to have a link to the benchmark results?
Nice article. It highlights the key point that I made in the MPhil compiler course I used to teach: JITing is easy, knowing when and what to JIT is hard.
A few nits:
Not entirely true. In the absence of dynamic linking and function-pointer type casts, you can do reachability analysis and compile only the cases that are reached. Pony does this, for example.
This is not really true. For example, a lot of dynamic behaviour in C comes from the dynamic linker. You can’t inline in AoT compiled C across a shared-library boundary but you could with JIT’d C. C programs often have a few functions that are always called with a handful of argument values. Optimising for the ones you see can also improve things (though you could do the same with profile-guided optimisation if you tracked values - most things just track branches). Additionally, if you’re JITing, you can target exactly the hardware that you’re running on: Do you have SSE4 as your baseline or AVX512, what is the relative cost of different vector operations? One of the presentations on the LLVM-based fourth-tier JIT for JavaScriptCore showed that a couple of benchmarks were faster when compiled from C to JavaScript and then JIT’d with JSC’s FTL than they were when compiled with clang, for precisely this reason (apples to apples comparison: both approaches used clang as the parser and LLVM as the back end).
Laurie Tratt has a nice paper about this. It’s actually worse than the author describes: JITs don’t always (or often) reach a steady state after warmup
Wow thanks for all the information (I’m author) in the first part that’s actually new to me! I actually have been bit by VMs giving me a squiggly line in warmup at peak performance and it was awful c: (I linked the Laurie-article you’re referencing in the post)
That sounds super interesting. Do you happen to have a link to the benchmark results?
There is a second article in the series which is even better. Thanks!
Yeah, I liked the second one even better: https://carolchen.me/blog/jits-impls/
d8 can emit the graph IR to json files with the –trace-turbo flag, which can then be visualized with https://v8.github.io/tools/head/turbolizer/