It took a while to find any code in there! I love how simple this technique makes writing jump-threaded interpreters. It’s a shame that you need the NEXT macro everywhere. I wonder if you could remove this (given that you’re using C++) by defining a templated dispatcher that does an always-inline call to the real thing and then tail-calls the next opcode.
That’s a good idea. I’ve just added a Word constructor that takes an Op as an argument, and the next step is to make the Op function into a lambda parameter inside the ctor call. I’ll have to see if calling the lambda and then returning NEXT will properly get inlined and tail-call-optimized, but it seems like it should..l
If the lamda is used in precisely one place, it should be inlined, but you need to make sure that the outer function is specialised for each value. The easiest way of doing this is to make it a template function that takes the opcode as the template argument, then you get a different function. Make sure you don’t use std::function because the type erasure makes inlining harder for the compiler.
Wasm3 doesn’t use the native stack; instead it keeps its own call stack. This is probably more optimal, but I haven’t done it yet simply because it’s more work.
I’m curious as to whether this would actually be more efficient - some x86 CPU’s use a stack engine that provides hardware acceleration of the stack - Wikipedia gives an example of the stack pointer being split (https://en.wikipedia.org/wiki/Stack_register#Stack_engine), while I heard via word-of-mouth that there are CPUs that actually mirror the top few stack items from memory to registers in the CPU.
I’d like to use this in a project, but I need wide platform support: Windows/MacOS/Linux + x64-64/arm64/WASM. So my questions are: on what platforms does this work?
To answer my own question, this won’t work in WASM, and can’t work until the WASM “tail call” feature is available. Meanwhile, in order to ensure support on all platforms, I would need to write my interpreter so that it “falls back” to working code if tail-call optimization is not available.
One of the cool features of this technique is that it lets you write an interpreter for a functional language that has proper tail-call semantics, and tail calls will work (without blowing up the C stack). But, if you need to also support platforms (like WASM) where the tail-call optimization is not available, then you need to write fallback code that handles tail calls in some other way, taking care not to blow up the stack.
This is literally just a hack I wrote in one day, and have run on one platform (Intel macOS), so I don’t have direct answers for you.
But I can say that it’s closely based on Wasm3’s interpreter core, the “Massey Meta Machine” as described on that Interpreter.md page I linked to in the source comments, and Wasm3 runs on all the platforms you listed. So AFAIK Tails should work too. You definitely want to read that page!
You’ll need a suitable cross-platform build script, which you can probably adapt from Wasm3’s CMake script — that’s where I got that list of compiler flags from. And Tails worked without those flags, the code just wasn’t as optimal since the Op functions were creating and tearing down useless stack frames. Likewise, the new “musttail” flag isn’t necessary, it just prevents eventual stack overflow in non-optimized builds.
It took a while to find any code in there! I love how simple this technique makes writing jump-threaded interpreters. It’s a shame that you need the NEXT macro everywhere. I wonder if you could remove this (given that you’re using C++) by defining a templated dispatcher that does an always-inline call to the real thing and then tail-calls the next opcode.
That’s a good idea. I’ve just added a Word constructor that takes an Op as an argument, and the next step is to make the Op function into a lambda parameter inside the ctor call. I’ll have to see if calling the lambda and then returning NEXT will properly get inlined and tail-call-optimized, but it seems like it should..l
If the lamda is used in precisely one place, it should be inlined, but you need to make sure that the outer function is specialised for each value. The easiest way of doing this is to make it a template function that takes the opcode as the template argument, then you get a different function. Make sure you don’t use
std::function
because the type erasure makes inlining harder for the compiler.Thanks snej! Have you discussed or shown this to Kragen and Volodymyr?
https://github.com/kragen
https://github.com/vshymanskyy
It would be awesome to have a presentation on this. This is super cool.
Nope, I don’t know (of) either of them. Do they work on Wasm3, or Forth interpreters?
Kragen wrote https://github.com/kragen/stoneknifeforth, good discussions about it and others here.
and Volodymyr is a co-author of Wasm3
I’m curious as to whether this would actually be more efficient - some x86 CPU’s use a stack engine that provides hardware acceleration of the stack - Wikipedia gives an example of the stack pointer being split (https://en.wikipedia.org/wiki/Stack_register#Stack_engine), while I heard via word-of-mouth that there are CPUs that actually mirror the top few stack items from memory to registers in the CPU.
Cool; thanks for sharing a small hackable example of this technique.
I’d like to use this in a project, but I need wide platform support: Windows/MacOS/Linux + x64-64/arm64/WASM. So my questions are: on what platforms does this work?
To answer my own question, this won’t work in WASM, and can’t work until the WASM “tail call” feature is available. Meanwhile, in order to ensure support on all platforms, I would need to write my interpreter so that it “falls back” to working code if tail-call optimization is not available.
One of the cool features of this technique is that it lets you write an interpreter for a functional language that has proper tail-call semantics, and tail calls will work (without blowing up the C stack). But, if you need to also support platforms (like WASM) where the tail-call optimization is not available, then you need to write fallback code that handles tail calls in some other way, taking care not to blow up the stack.
This is literally just a hack I wrote in one day, and have run on one platform (Intel macOS), so I don’t have direct answers for you.
But I can say that it’s closely based on Wasm3’s interpreter core, the “Massey Meta Machine” as described on that Interpreter.md page I linked to in the source comments, and Wasm3 runs on all the platforms you listed. So AFAIK Tails should work too. You definitely want to read that page!
You’ll need a suitable cross-platform build script, which you can probably adapt from Wasm3’s CMake script — that’s where I got that list of compiler flags from. And Tails worked without those flags, the code just wasn’t as optimal since the Op functions were creating and tearing down useless stack frames. Likewise, the new “musttail” flag isn’t necessary, it just prevents eventual stack overflow in non-optimized builds.