This is incredibly cool. I liked it initially, and then I saw that its deployment model is Smalltalk-style images and liked it even more. For the folks not excited enough by the summary:
MIT licensed
Supports modern JavaScript features.
Runs on the desktop, generates a VM image (compiled bytecode and object state), moved to the MCU
Quite surprisingly, I was able to build it on Morello with no source-code modifications. It looks as if they keep pointers to large malloc-backed regions and uses 16-bit integers as offsets to these, which means that it works fine with a strict-provenance architecture.
This puts me in mind of General Magic, which had a system where software agents could travel from one host to another and maintain their state. So you could send a program to a server that could talk to the server, do something on your behalf, and return back to you with results.
Microvium’s hard limit of 64KB of heap space is kind of, uh, limiting, but it would be reasonable for small tasks. And it might make it safer for running untrusted code, since there’s no way for the VM to read or write outside its heap (right?) or cause trouble by growing without bound.
A 64 KiB heap is fine for the intended use case: microcontrollers with tens to hundreds of KiBs of SRAM. The way that it manages the memory is quite interesting. The embedding environment needs to provide a malloc call that will allocate chunks. It then uses these to build a linear address address space that it bump-allocates into. When you turn a 16-bit integer (actually a 15-bit integer with a 1-bit tag) into a pointer, you have to search the list of chunks to find the one with the corresponding offset. When you run the GC, it does a semi-space compacting GC pass, allocating new chunks and copying live objects into them.
This works very well with CHERI because each chunk is represented by a valid pointer and so you get a clean derivation chain. It also works very nicely with the temporal safety work: JavaScript objects are live until a GC happens, and so if you’ve taken a pointer to them and passed it into C code then this is stable. When a GC happens, all of the existing chunks will be deallocated and the revoker will prevent dangling pointers from C code accessing them.
If you’re on an 8-bit or 16-bit device (or more accurately, a device with a 16-bit address space) then pointer access is native and should be fast.
For devices with a larger address space (e.g. 32- or 64-bit) but where you know that the VM memory will always be in a 64kB sub-range of the total address space, you can configure a base address and then the 16-bit references are relative to that base address. In my tests when compiling to ARM, this adds just one or two instructions to each pointer access (1 instruction in full ARM and 2 instructions in Thumb) so it’s still pretty efficient. This case works for 32-bit MCUs where SRAM is mapped to a sub-range of the 32-bit range. It also works in desktop and server-class devices if you implement MVM_MALLOC and MVM_FREE to allocate within a subrange (this is what I do for testing and debugging).
In the worst case, if you’re working with a large address space with no guarantee about the address sub-range, then pointer encoding/decoding is expensive and involves traversing the linked list of allocated buckets. But one consolation is that a GC cycle essentially packs all the active buckets in one.
The only thing that Elk can do that Microvium can’t do is execute strings of JavaScript text at runtime.
Sounds like a feature to me. This “lack of a feature” would close a fairly significant attack surface. In the internet of things, this seems like a great feature to drop.
That’s an interesting thought. If your symbolic execution phase can construct the string then you should be able to generate a function for the result of the eval and rely on the existing late binding to connect it up to the namespace. You can’t grab some text from an I/O channel and eval it, but you probably could eventually support cases where you are constructing a string based on some data-dependent input.
This is incredibly cool. I liked it initially, and then I saw that its deployment model is Smalltalk-style images and liked it even more. For the folks not excited enough by the summary:
Agreed I also love it! I’m tempted to try and run it on the Nintendo 64 for some homebrew hacking.
These engines are so cute with their teeny little feature lists: “mJS lets you write for-loops and switch statements for example” :)
Quite surprisingly, I was able to build it on Morello with no source-code modifications. It looks as if they keep pointers to large malloc-backed regions and uses 16-bit integers as offsets to these, which means that it works fine with a strict-provenance architecture.
This puts me in mind of General Magic, which had a system where software agents could travel from one host to another and maintain their state. So you could send a program to a server that could talk to the server, do something on your behalf, and return back to you with results.
Microvium’s hard limit of 64KB of heap space is kind of, uh, limiting, but it would be reasonable for small tasks. And it might make it safer for running untrusted code, since there’s no way for the VM to read or write outside its heap (right?) or cause trouble by growing without bound.
A 64 KiB heap is fine for the intended use case: microcontrollers with tens to hundreds of KiBs of SRAM. The way that it manages the memory is quite interesting. The embedding environment needs to provide a malloc call that will allocate chunks. It then uses these to build a linear address address space that it bump-allocates into. When you turn a 16-bit integer (actually a 15-bit integer with a 1-bit tag) into a pointer, you have to search the list of chunks to find the one with the corresponding offset. When you run the GC, it does a semi-space compacting GC pass, allocating new chunks and copying live objects into them.
This works very well with CHERI because each chunk is represented by a valid pointer and so you get a clean derivation chain. It also works very nicely with the temporal safety work: JavaScript objects are live until a GC happens, and so if you’ve taken a pointer to them and passed it into C code then this is stable. When a GC happens, all of the existing chunks will be deallocated and the revoker will prevent dangling pointers from C code accessing them.
Sounds clever, but also kinda slow to deref a “pointer”; though I recognize speed isn’t a goal here.
If you’re on an 8-bit or 16-bit device (or more accurately, a device with a 16-bit address space) then pointer access is native and should be fast.
For devices with a larger address space (e.g. 32- or 64-bit) but where you know that the VM memory will always be in a 64kB sub-range of the total address space, you can configure a base address and then the 16-bit references are relative to that base address. In my tests when compiling to ARM, this adds just one or two instructions to each pointer access (1 instruction in full ARM and 2 instructions in Thumb) so it’s still pretty efficient. This case works for 32-bit MCUs where SRAM is mapped to a sub-range of the 32-bit range. It also works in desktop and server-class devices if you implement
MVM_MALLOC
andMVM_FREE
to allocate within a subrange (this is what I do for testing and debugging).In the worst case, if you’re working with a large address space with no guarantee about the address sub-range, then pointer encoding/decoding is expensive and involves traversing the linked list of allocated buckets. But one consolation is that a GC cycle essentially packs all the active buckets in one.
More info here: https://coder-mike.com/blog/2022/05/20/microvium-updated-memory-model/
Sounds like a feature to me. This “lack of a feature” would close a fairly significant attack surface. In the internet of things, this seems like a great feature to drop.
It’s like Content-Security-Policy script-src: self for microcontrollers. ;)
The main drawback is that you can’t ever support
eval
. That said, I agree that, for a lot of use cases, this is a feature and not a bug.Yes. Although, to be accurate, you can never support
eval
at runtime. But compile-time code could supporteval
the same way it supportsimport
.That’s an interesting thought. If your symbolic execution phase can construct the string then you should be able to generate a function for the result of the eval and rely on the existing late binding to connect it up to the namespace. You can’t grab some text from an I/O channel and
eval
it, but you probably could eventually support cases where you are constructing a string based on some data-dependent input.