We found that a register architecture requires an average
of 47% fewer executed VM instructions, and that the resulting register code is 25% larger than the correpsonding
stack code. The increased cost of fetching more VM code
due to larger code size involves only 1.07% extra real machine loads per VM instruction eliminated. On a Pentium
4 machine, the register machine required 32.3% less time to
execute standard benchmarks if dispatch is performed using
a C switch statement. Even if more efficient threaded dispatch is available (which requires labels as first class values), the reduction in running time is still around 26.5% for the
register architecture.
Every paper I’ve seen basically agrees that register machines are faster and stack machines are simpler (both to implement and to compile to). Having built both I hate building register machines because I’m lazy and hate doing register allocation in the compiler. It’s so much easier to generate stack instructions instead. You pay for it in performance.
The thing I’ve seen most often seen suggested is to allow for direct references into the stack and popping multiple items at once on return, etc.. local variables can then be referenced as offsets on the stack. I’ve implemented this for a basic lisp vm, not sure how difficult it becomes for more imperative languages where you’re not compiling local variables to function calls to implement easy lexical scope.
While generating instructions operating on stack, are the operations rearranged so that the stack operations are minimum usually? The paper here is only considering translation of a stack based code to an optimized register based one. It doesn’t seem to say if the stack based one was optimized for stack.
Another study on the topic: https://arxiv.org/abs/1611.00467.
Every paper I’ve seen basically agrees that register machines are faster and stack machines are simpler (both to implement and to compile to). Having built both I hate building register machines because I’m lazy and hate doing register allocation in the compiler. It’s so much easier to generate stack instructions instead. You pay for it in performance.
Surely there is a way to view register allocation as stack allocation, since they can be mapped to the same things
The thing I’ve seen most often seen suggested is to allow for direct references into the stack and popping multiple items at once on return, etc.. local variables can then be referenced as offsets on the stack. I’ve implemented this for a basic lisp vm, not sure how difficult it becomes for more imperative languages where you’re not compiling local variables to function calls to implement easy lexical scope.
While generating instructions operating on stack, are the operations rearranged so that the stack operations are minimum usually? The paper here is only considering translation of a stack based code to an optimized register based one. It doesn’t seem to say if the stack based one was optimized for stack.
I spent many years working on JSC, and we very quickly came to the conclusion that a register machine was much faster.
We solved the allocation complexity by simply treating it as an infinite register machine.
Overall this helped in subsequent optimising JIT work as we didn’t need to do what is essentially a conversion of stack slots to registers again.
There is also a longer journal version from 2008: https://dl.acm.org/doi/10.1145/1328195.1328197 (freely available)