I don’t really agree with the idea that mmap memory is heap memory, in spite of the long argument. On a modern *NIX system, mmap is the only way of getting memory in a userspace process and so is the thing used to get the heap, the stack, and the mappings of the binaries that provide globals and functions. If all of these are heap memory then everything in a process is heap memory and you may as well just say ‘memory’.
It’s also worth noting that you often want to JIT units smaller than a page and so can’t do W^X the way that this article proposes. The canonical way of doing it is to create an anonymous shared memory object and map it read-write (or write-only) in one location and read-execute (or execute-only) in another location. This makes it difficult to automatically find the writeable address from a function pointer. Apple’s iOS JIT goes one step further and JITs a memcpy that copies into the writeable location and then scrubs all other references to the writeable address, leaving only the execute-only mapping. To be able to write, you need to either leak the writeable address via a side channel (unfortunately, this is quite easy) or find the address of the special memcpy (unfortunately, this is also easy because it’s at the start of the region and so easy to guess given a function pointer and easy to check guesses with speculative side channels).
Hm. An idea that comes to mind:
Higher overhead because of the IPC and context switching, but probably easier to hide the authentication key.
This is more or less what the pre-Chromium Edge did on Windows, except that Windows has much nicer APIs for it. The memory mapping APIs take the HANDLE to the target process, so one process would JIT into a shared memory object and then map that object executable in the renderer process.
I’m not sure what the HMAC is for here. You already have control over who can send messages because only the process that has the other end of the IPC channel (socket or pipe) can send messages to the JIT process. If your goal is to privilege separate within the process that’s running the code, you’re solving an impossible (without CHERI) problem because an attacker who can run JIT’d code can insert a gadget that lets them probe the secret with speculative side channels and so they can trivially extract it in well under a second and then forge it. And if you do have CHERI, then you can put the JIT in a separate compartment in the same address space, map the memory RWX, give the consumer an X / RX capability and the JIT compartment a W / RW capability and get the same security guarantees with a fraction of the overhead.
Did a similar experiment in Nim if someone is interested: https://github.com/eterps/loader/blob/master/syscall.nim (targets x64 Linux)