1. 30
  1. 7

    Haven’t looked at the code, but the README is a nice Sunday-morning read: there’s a really clear overview of how the interpreter actually works. Definitely worth the click, even though I don’t use C++ and probably won’t ever get to write Forth in anger.

    1. 6

      Note: I first submitted Tails a month ago when it was just a little gist. I’ve kept working on it, and it now has its own repo. I’ve added a parser, a stack-effect checker, more data types (strings, arrays, quotations), type-checking, and a dumb but effective garbage collector. There’s even a REPL so you can try it out interactively.

      1. 1

        Is there a particular compiler I should use? GCC is throwing a lot of errors.

        1. 1

          I’ve pushed a fix for the GCC compatibility issues, which were pretty trivial. Thanks for pointing them out.

          1. 1

            I build with Clang. I should install GCC and try it. The most likely cause of build errors is missing standard library includes…

        2. 4

          I’m familiar with tagged pointers, as so many interpreters use them (particularly Ruby, which I’ve hacked on quite a bit). NaN boxing is new to me, though.

          My first thought after reading about the technique is whether signaling NaNs could be used to make arithmetic more efficient. Since using a signaling NaN would generate an exception, generated code could just use the number as-is and catch the exception, falling back to a slow path in that case.

          My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

          1. 4

            What I do to make arithmetic more efficient in my nan-boxed interpreter is something like this:

            Value add_op(Value a, Value b) {
                double result = a.number + b.number;
                if (result != result) goto slow_path;
                return Value{result};

            I optimistically add the two nan-boxed values using float arithmetic. If the result isn’t a nan, I return it. The overhead for the fast path is a nan test and a conditional branch that isn’t taken. The slow path takes care of adding things that aren’t doubles, and throwing an exception if the arguments have the wrong type.

            Using signaling nans and traps would make the code significantly more complicated. And I don’t see why it would be faster. Setting up and tearing down the trap handler for the add case would probably be at least as expensive as the above code, and probably more so.

            1. 2

              That’s interesting; I very much like that technique.

              I wouldn’t expect the trap handler to be set up and torn down for every evaluated opcode, just once at startup. The exception would certainly be slower than a failed branch prediction. The fast path would save a comparison, which may or may not be faster on a modern processor.

              1. 3

                The tail-threaded interpreter passes around its state in registers. In the case of “tails”, there are two registers containing sp and pc. How does a generic trap handler determine what opcode is being executed, and how does it access the interpreter state? If this information is stored into global variables before executing the add instruction, then that setup code is more expensive than the nan test in my code. Is there a way to use a trap that is cheaper than the nan-test code? Without using assembly language or other CPU and ABI specific code, I mean.

                1. 1

                  Hmm, that’s a good point. I had imagined it walking the stack like a debugger would, but that would definitely be CPU/ABI specific.

              2. 2

                That’s essentially what I do. The constructor that takes a double checks whether it’s already a NaN, which would of course be misinterpreted, and substitutes the value that means “null”. A handy side effect is that arithmetic on non-numbers produces null, which is similar to SQL semantics.

              3. 2

                My second thought is that we’ve had tagged pointers for decades, so why aren’t there hardware instructions for working with them?

                There are some, these are generally called “tagged architectures”. Things like Burroughs Large Systems and LispMs did this. ARM, PowerPC, and a few others also have support if you opt into it, but these are usually exposed to compiler writers, so your compiler has to know about & use them. It’s definitely an interesting area.

                1. 3

                  CHERI is a tagged architecture, with a single one-bit non-addressable tag to differentiate between memory capabilities (which the C/C++ compiler uses to represent pointers and have strong integrity properties) and other data. Arm is providing us with an implementation early next year.

              4. 3

                The links in the README seem broken for instruction.hh etc. Just a heads up.

                1. 2

                  Heh, just saw this after submitting a PR.

                  1. 2

                    Oh right, because I moved source files into subfolders recently. Thanks for letting me know.