1. 40
  1.  

  2. 3

    Great post! Although it’s just an argument about naming, I think the final compiler idea is legitimately not a virtual machine anymore. It’s just a custom library you’re hooking into in your compiled code.

    For example, I’ve done this a few times where I wrote a basic JavaScript compiler that compiled JavaScript to C++ and imported V8’s libraries and made use of V8 objects and functions. I learned this technique when I built an interpreter for a Scheme in D and then used the interpreter code as a library for a different backend that compiled the Scheme to D. I don’t personally consider the compiled result a VM anymore.

    It’s my impression that this is how dynamic language implementations that are compiled implement the compiler. They have to hook into dynamic runtime stuff in the compiled output. But the compiled output isn’t really a VM (anymore).

    I don’t know for sure if this is how Chicken Scheme works but maybe it is.

    1. 6

      I don’t know for sure if this is how Chicken Scheme works but maybe it is.

      CHICKEN is a bit more direct in how it emits code - it doesn’t generate “custom opcodes” or decoding loops. It does have one “main loop” hidden underneath everything that contains a setjmp call which is longjmped to after a GC is triggered. Each generated C function contains a out of memory check which saves the continuation (basically the current C function with its arguments) and triggers the GC. The “main loop” then simply invokes the continuation again after GC is done, so it acts as a sort of trampoline.

      From what little I know about Gambit, I believe it emits C code that looks a bit more like a VM.

      1. 2

        Thanks!

        I agree, the compiler isn’ a virtual machine anymore. I think it’s an interesting illustration of how one can relatively easily (in principle) convert a VM into a compiler that’s compatible with the same bytecode. I’ll add a caveat that it isn’t a VM anymore.

        1. 2

          As I reread the code I think I’m wrong and you were originally correct. I don’t think you’re doing the same thing I did in the two linked projects.

          My (possibly incorrect) rule of thumb about interpreters and VMs is: who controls the instruction pointer for the execution of the original input program?

          If your code does, you’ve got an interpreter. If the CPU does, you’ve compiled to native code.

          1. 3

            Hmm. My “rule” is something like: If there’s a “thing” which can in principle be fed arbitrary bytecode to execute, it’s a virtual machine. If there’s a “thing” which can in principle be fed arbitrary bytecode but it produces native code where the bytecode can’t be changed anymore even in principle, it’s a compiler.

            Using that definition, a “compiler” which just outputs an executable with an interpreter preconfigured to run your bytecode, you have an interpreter/VM, because one could in principle change only the bytecode “payload” to input another program. But the compiler in the article produces native code which simply reflects the input bytecode. Once the bytecode is compiled, there is no machinery left to run arbitrary bytecode.

            I don’t know which rule of thumb is correct.

            1. 1

              Yeah I’m not debating about calling it a compiler or not. Just about every “interpreted” language out there is a compiler.

              My second comment clarified to not talk about “compilers” or not (though I wasn’t explicit).

              I’m talking about whether there’s a VM or not. If there’s a VM, there’s an interpreter. That’s unrelated to whether there’s a compiler.

              So I think we’re agreeing.

      2. 2

        It would be interesting to see an analysis comparing the computed jump table with the tail-calling implementation, as it’s not at all clear why one would be faster than the other. If handwriting assembly, I would expect to write more or less identical code in each case. Putting everything in one subroutine in principle allows the compiler greater freedom wrt register allocation and constraint propagation and maintenance—interprocedural optimisation is of course possible, but c compilers tend to suck at it—but this may work to its detriment, if it makes a bad register allocation decision somewhere and screws everybody over.

        1. 2

          The computed jump table still has to do the look-up every instruction, essentially goto *jmptable[bytecode[instruction]]. In the tail-call case, we have pre-processed the input, so we have an array of function pointers rather than opcodes; essentially, goto functions[instruction]. I’m thinking removing that level of indirection might make the code clearer to the CPU and help with speculatively executing the next function or something.

          We could do similar preprocessing with the custom jump table approach. I actually did implement that at some point, but opted not to include it because it was so similar to the tail call approach with the downside of being GNU-specific.

          I also doubt that the compiler can really do anything super clever with register allocation in the computed goto case. It has to support any ordering of labels, with any value for the stack pointer at any time. The best it can really do is to put the stack pointer and instruction pointer in registers. It just so happens that in modern calling conventions, scalar parameters are passed in registers as well, so the tail call case will also keep the instruction pionter and the stack pointer in registers.

          1. 1

            We could do similar preprocessing with the custom jump table approach. I actually did implement that at some point, but opted not to include it

            Oh—I skimmed, and assumed that was what you had done. Nevermind.

        2. 2

          Isn’t this approach the direct threading approach?

          1. 1

            Yes, that’s precisely what this is. It’s a technique that’s been around for quite some time, but I’m always it’s still effective on modern CISC processors. I suppose reducing your hot path code size is good for any any architecture.

          2. 2

            I’ve been building a VM recently for a programmable RTS game I’ve been working on. And I’ve ended up creating a language on top of the flat assembler with the semantics of my VM bytecode. So it ends up being compiled to very efficient native x86_64 code. The VM runs as a coroutine with each own stack so I could have thousands running, each with each own state but in a very lightweight manner. I was thinking to write a blogpost about it (first time I feel like I have anything interesting to write)

            1. 1

              Good article. I noticed there are several places where you say million when you meant billion, for IPS.

              1. 1

                I don’t think so? I use million IPS as a unit consistently, and sometimes that number is in the thousands.

                1. 2

                  I think mperham refers to this sentence near the beginning.

                  That means my laptop runs 856 million bytecode instructions per second (“IPS”) on average, and my desktop runs 1.1 million instructions per second.

                  The table right below shows 1059 M IPS.

                  1. 2

                    Oh, they said there were several so I assumed there was a misunderstanding of the numbers in the tables. I corrected the case you pointed out, thanks

                    1. 1

                      I’ve never seen this formatting for numbers 5'882 million. It’s more standard in English to say 5.882 billion.

                      1. 1

                        The ' is meant to be a thousands separator. I chose it because 1) , and . are the standard thousands separators but are ambiguous, and 2) ' is what C++ and a few other places uses and doesn’t already mean “decimal point”. Maybe it wasn’t as unambiguous as I had hoped.

                        1. 2

                          Unfortunately there’s no good solution, but thanks for at least trying.

                          1. 1

                            ' as a thousands separator is fine and unambiguous.

                            1. 1

                              In spain, ’ is used as a decimal separator in handwriting. It also caused confusion in this very same thread. A space seems better in writing (5 882).

                              1. 2

                                ISO recommends a space as a thousands separator: https://en.wikipedia.org/wiki/Decimal_separator#Current_standards, but a comma or a period is also allowed.

                                Swedish typographical best practice is to use a thin space. It should of course also be non-breaking.

                                1. 1

                                  Rats. I had not heard of that. Thanks. The only argument that I knew of concerning thousands separators was between English and Germanc speakers using “.” and “,” in opposite orders.