1. 21

  2. 4

    Oh hey, this was my Master’s project!

    I worked with Eliot at UMass, as well as the team at ANU, on building a formal semantic model of Mu in HOL4. (HOL4 is a non-Turing-complete total functional language embedded in Standard ML, designed to write proofs.)

    The idea was to formally prove that simple optimizations on Mu bytecode resulted in code with identical semantics. As far as I was aware, this hadn’t been done for any other IR language.

    I wrote a lot of code, but I didn’t manage to prove much before just submitting a report on the code I had written and leaving with a Master’s degree. Turns out, representing an entire VM (even a simple one) in a logic language is hard. And my previous experience with functional programming was Haskell, so I tried to write HOL4 like it was Haskell (lots of monads), and the result was… kind of a tarpit. I feel bad for whoever picked up that project after I left. :/

    1. 4

      A large fraction of today’s software is written in managed languages. These languages increase software productivity by offering rich abstractions for managing memory, executing code concurrently, and hiding the complexity of modern hardware. Examples include JavaScript, PHP, Objective-C …

      Objective-C is not in any way a managed language. It’s a superset of ANSI C (or optionally C++) that adds language constructs for OOP. memset((void*)1, 0xff, UINT_MAX) is perfectly legal Obj-C code :)

      What Objective-C has in common with most of the other languages in that list is a dynamic object model with introspection, but that has nothing to do with being managed.

      1. 1

        Is this really “micro”? It seems of similar scope and complexity to LLVM, or at least wasm.

        1. 1

          Why is a VM the limiting factor for speed? Instead I’d say the big limiting factors are independent of a VM:

          • Inability to control memory management in sufficient detail (because of mandatory GC usually)
          • Inefficient dynamic dispatch via hash map (because of money patching capabilities usually)
          • Inability to exploit hardware capabilities like vector instructions
          1. 1

            I’m generally skeptical of language-independent VMs, but I re-read some of this and it seems well motivated (common infrastructure for GC, concurrency, and native code gen). One thing that’s missing is a C FFI, which is a big deal in practice for all the languages listed (PHP, Python, Ruby, etc.)

            It’s interesting to compare it to WASM which also aimed to be somewhat language independent, but I’d say the initial version was really for Rust and C/C++. It lacked GC and concurrency (and compound data types), but they’re being added I believe. I guess it doesn’t say anything about code gen either. The self-hosting architectures in the papers are pretty interesting.

            1. 1

              I don’t quite get it, is it a virtual machine in the sense of a program that can run an instruction set (e.g. like the JVM), or is it a virtual machine in the abstract sense that the specification describes the language as if it’s running on a standardized piece of hardware no matter what the underlying hardware actually is (e.g. like the C virtual machine).

              1. 2

                The idea was to be both. The VM should be thoroughly specified, and even formally verified, but also implemented on real hardware in such a way that the distinctions between hardware platforms either don’t matter or are accounted for in the spec.