It amazes me that V8 has no less then 4 jit compilers in one runtime. Each one is designed with different tradeoffs of startup time vs code size vs code speed. That level of engineering is just crazy. It shows the amount of work put into making js faster. Looks like Maglev is designed as a middle of the road compiler, balancing startup time against execution speed.
I’m amazed it’s taken so long to get to that point. JavaScriptCore, the WebKit JavaScript runtime has had four execution engines for a decade and I think their design is a lot cleaner than V8’s because they all feed into each other. I’d love to see a performance comparison of each of the tiers in JSC against V8.
JSC starts in an interpreter, but the interpreter is written in a portable macro assembler. This has two benefits. The first is that it’s really small. It fits in the instruction cache on most devices (including phones) and so the interpreter is very fast. It also means that they have complete control over stack layout, which means moving to and from lower tiers is easy.
Next, there is the baseline JIT. This uses exactly the same implementations of bytecode interpreters as the interpreter but it pastes small operations together into larger blocks avoiding any overhead of decoding bytecodes or executing them. The baseline JIT has exactly the same on-stack representation as the interpreter and so it’s easy to JIT a hot loop and move execution to that without having costly transitions between the JIT’d and interpreted code.
The first two tiers record a lot of profiling information (hit counts for functions and loop entries, type information for expressions). This is fed into the first optimising JIT, which uses a continuation-passing style IR to do dataflow optimisations. This works very nicely with the type info from the previous tiers and lets them take cross-function traces and optimise the hot paths, with side exits back to the second-tier JIT or first-tier interpreter if a new branch is taken.
For things that have been run with the CFG interpreter a lot, they then have a fourth tier (when this was Fourth Tier LLVM it had a much better initialism), which uses an SSA back end to do some micro-optimisations and better register allocation. This is where things like hot game loops end up.
The big difference between JSC and V8 is their attitude towards bytecode. V8 avoids having any bytecode. This means that each transition between tiers requires a re-parse of some of the source code. The advantage is that they can evolve their tiers independently and don’t have the memory overhead of storing bytecode. In contrast, JSC has a bytecode form that’s shared between all tiers and so they don’t re-parse moving between tiers. The down side is that any change to the bytecode needs them to modify all of the tiers.
Sorry, I oversimplified. Each V8 JIT has bytecode, but it’s not (unless something changed recently), common across tiers. It’s an implementation detail of each. In contrast, JSC has a single pass that converts from source code to bytecode and is shared by all execution engines.
I’m still amazed we keep putting more and more lipstick on that js code on the client side rather than allowing properly optimised release-time compilation which needs to happen just once. Wasm is kind of that thing, but it would be amazing if it was usable enough for a typical release pipeline - compile your JS by default if you have any. Pnacl went that direction too.
It feels like 3 out of 4 amazing jit compilers that we shouldn’t even need.
It’s a pity they don’t isolate Maglev and Turbofan on their performance graphs. Maglev takes 10x less time to compile, but how does that compiled code compare with TurboFan’s?
It amazes me that V8 has no less then 4 jit compilers in one runtime. Each one is designed with different tradeoffs of startup time vs code size vs code speed. That level of engineering is just crazy. It shows the amount of work put into making js faster. Looks like Maglev is designed as a middle of the road compiler, balancing startup time against execution speed.
I’m amazed it’s taken so long to get to that point. JavaScriptCore, the WebKit JavaScript runtime has had four execution engines for a decade and I think their design is a lot cleaner than V8’s because they all feed into each other. I’d love to see a performance comparison of each of the tiers in JSC against V8.
JSC starts in an interpreter, but the interpreter is written in a portable macro assembler. This has two benefits. The first is that it’s really small. It fits in the instruction cache on most devices (including phones) and so the interpreter is very fast. It also means that they have complete control over stack layout, which means moving to and from lower tiers is easy.
Next, there is the baseline JIT. This uses exactly the same implementations of bytecode interpreters as the interpreter but it pastes small operations together into larger blocks avoiding any overhead of decoding bytecodes or executing them. The baseline JIT has exactly the same on-stack representation as the interpreter and so it’s easy to JIT a hot loop and move execution to that without having costly transitions between the JIT’d and interpreted code.
The first two tiers record a lot of profiling information (hit counts for functions and loop entries, type information for expressions). This is fed into the first optimising JIT, which uses a continuation-passing style IR to do dataflow optimisations. This works very nicely with the type info from the previous tiers and lets them take cross-function traces and optimise the hot paths, with side exits back to the second-tier JIT or first-tier interpreter if a new branch is taken.
For things that have been run with the CFG interpreter a lot, they then have a fourth tier (when this was Fourth Tier LLVM it had a much better initialism), which uses an SSA back end to do some micro-optimisations and better register allocation. This is where things like hot game loops end up.
The big difference between JSC and V8 is their attitude towards bytecode. V8 avoids having any bytecode. This means that each transition between tiers requires a re-parse of some of the source code. The advantage is that they can evolve their tiers independently and don’t have the memory overhead of storing bytecode. In contrast, JSC has a bytecode form that’s shared between all tiers and so they don’t re-parse moving between tiers. The down side is that any change to the bytecode needs them to modify all of the tiers.
The article seems to contradict that?
Sorry, I oversimplified. Each V8 JIT has bytecode, but it’s not (unless something changed recently), common across tiers. It’s an implementation detail of each. In contrast, JSC has a single pass that converts from source code to bytecode and is shared by all execution engines.
I’m still amazed we keep putting more and more lipstick on that js code on the client side rather than allowing properly optimised release-time compilation which needs to happen just once. Wasm is kind of that thing, but it would be amazing if it was usable enough for a typical release pipeline - compile your JS by default if you have any. Pnacl went that direction too.
It feels like 3 out of 4 amazing jit compilers that we shouldn’t even need.
It’s a pity they don’t isolate Maglev and Turbofan on their performance graphs. Maglev takes 10x less time to compile, but how does that compiled code compare with TurboFan’s?
[Comment removed by author]
Or MagLev the Ruby & Smalltalk compiler.