I love reading about peoples’ experiences with things like this, so thank you very much!
One thing that stood out to me: The pointer-tagging approach is interesting, but not one I’d usually reach for in Rust. You could instead rotate the object storage from an AOS to SOA, and store tagged indexes in the resolver; then it could be even smaller if you impose reasonable limitations on the number of each type of identifier a program can have (eg, 2^21 = 2097152 in 3 bytes, or 2^13 = 8192 in 2 bytes). It would also avoid any unsafe code, though you’d still risk panics if you use the indexes wrong.
Fun! If OP is interested in taking it to the next level, I made a repository comparing different ways to implement progressively faster interpreters in Rust. At the limit, you can start to approach the performance of native code. Somewhere in there there’s a sweet spot that trades off implementation complexity against performance. closure_continuations seems like a nice place.
I was surprised to see such a large performance improvement from shrinking the Object structure. What primarily drives this? My best guess is you can pack more Objects in a cache line
I love reading about peoples’ experiences with things like this, so thank you very much!
One thing that stood out to me: The pointer-tagging approach is interesting, but not one I’d usually reach for in Rust. You could instead rotate the object storage from an AOS to SOA, and store tagged indexes in the resolver; then it could be even smaller if you impose reasonable limitations on the number of each type of identifier a program can have (eg, 2^21 = 2097152 in 3 bytes, or 2^13 = 8192 in 2 bytes). It would also avoid any unsafe code, though you’d still risk panics if you use the indexes wrong.
Fun! If OP is interested in taking it to the next level, I made a repository comparing different ways to implement progressively faster interpreters in Rust. At the limit, you can start to approach the performance of native code. Somewhere in there there’s a sweet spot that trades off implementation complexity against performance.
closure_continuationsseems like a nice place.That’s cool! Do you have charts or numbers on the results?
I was surprised to see such a large performance improvement from shrinking the Object structure. What primarily drives this? My best guess is you can pack more Objects in a cache line
They mention in the post that it ends up being able to fit into a register. That possibility is pretty huge, I’d imagine.
Makes sense! The high probability was a piece I was missing.