I think about this all the time; I’m not sure I want stuff that’s already out there cross compiled for WASM, or just a whole new stack that was purpose built for it, but yes I do think it makes sense, or something like gVisor
This is basically exactly what I am working on, using a sandboxing approach called LFI, which is more similar to NaCl than Wasm. I think the LFI approach is better than Wasm for native code sandboxing because it trades off architectural portability for better performance and higher compatibility (it supports exceptions, setjmp/longjmp, programs with hand-written assembly, multithreading/atomics, etc.). I’m especially excited about this for library sandboxing (for libraries written in unsafe languages), since in-process sandboxing techniques like LFI/Wasm have very low “IPC” costs for calling functions across the sandbox boundary.
Yes exactly – the compiler applies a transformation while assembling that turns loads/stores into a 64-bit base added to a 32-bit offset, guaranteeing that all loads/stores can only access up to 4GiB from the base. For example, on Arm64 it transforms ldr x0, [x1] into ldr x0, [x21, w1, uxtw], where x21 stores the base and is a reserved register. At runtime, a verifier analyzes the machine code and ensures that all loads/stores have this form [1] and that x21 is never modified. Since Arm64 has a fixed-width instruction length, the analysis is sound (it’s impossible to jump into the middle of an instruction). The x86-64 implementation is a bit more complex since it requires a technique known as “instruction bundling” to solve this issue [2, 3].
[1]: Certain loads/stores can’t be encoded with the addressing mode used above. In those cases, a separate reserved register is used as the target, and the verifier ensures that this reserved register is only ever loaded with values computed as x21+wN (64-bit plus 32-bit). Similarly, the stack pointer can only be loaded with sandbox addresses, so accesses to the stack don’t have to be rewritten.
Thanks for the links, definitely curious on how you dealt with the addressing modes in x86
I’ve been toying with the idea of embeddable workloads on nommu linux + usermode linux, which has interesting characteristics, but a shared address space is one of the main problems. Something like LFI will be interesting to research.
On x86-64 you can get a similar effect (64-bit base+32-bit offset) by using a segment register, for example %gs:(%edi). Often you can get away with just adding a gs segment selector and switching registers to 32-bit versions, but sometimes you have to rewrite into an lea followed by a simple %gs:(%edi) style access.
I think combining this with usermode virtualization approaches like UML would be very cool. Last time I looked at UML though it didn’t support SMP, which was disappointing. NoMMU+UML sounds like Nabla Linux – not sure if it still works though.
This reminds me of a suspicion I have that Rust’s downfall will be WASM. The reason being, we will all outsource safety to the environment instead of doing it at the language level.
I like rust more for correctness than for safety. It’s more important to me for my program to fail less often, and lots of rust features help me do that more than many other languages, while having a large-ish ecosystem. Safety is great, but if I replaced rust with some sandboxed language that still crashes often due to mistakes I make, I wouldn’t be interested.
Rust works pretty well on both sides of the WASM sandbox boundary. As a sandbox implementation it can compete directly with C and C++ in both speed and portability while providing more tools to verify correctness, and Rust when compiled to WASM produces very small binaries that don’t need special shim code or platform support (compare with, for example, Go).
WASM competes with sandboxing strategies such as seccomp, and at the margins can replace uses of Firecracker or gVisor (when those are being used to sandbox a single process) with infinitely better portability.
I am so looking forward to the point where this kind of thing is easy and straightforward, and not a giant mess.
I want to run pretty much everything in a WASM sandbox. It’s 2025, why run untrusted code using anything by else?
Clearly the JVM missed an opportunity here.
I think about this all the time; I’m not sure I want stuff that’s already out there cross compiled for WASM, or just a whole new stack that was purpose built for it, but yes I do think it makes sense, or something like gVisor
This is basically exactly what I am working on, using a sandboxing approach called LFI, which is more similar to NaCl than Wasm. I think the LFI approach is better than Wasm for native code sandboxing because it trades off architectural portability for better performance and higher compatibility (it supports exceptions, setjmp/longjmp, programs with hand-written assembly, multithreading/atomics, etc.). I’m especially excited about this for library sandboxing (for libraries written in unsafe languages), since in-process sandboxing techniques like LFI/Wasm have very low “IPC” costs for calling functions across the sandbox boundary.
Super interesting! How does the memory sandboxing work? are all reads/writes modified to be offsets from the sandbox base?
Yes exactly – the compiler applies a transformation while assembling that turns loads/stores into a 64-bit base added to a 32-bit offset, guaranteeing that all loads/stores can only access up to 4GiB from the base. For example, on Arm64 it transforms
ldr x0, [x1]intoldr x0, [x21, w1, uxtw], wherex21stores the base and is a reserved register. At runtime, a verifier analyzes the machine code and ensures that all loads/stores have this form [1] and thatx21is never modified. Since Arm64 has a fixed-width instruction length, the analysis is sound (it’s impossible to jump into the middle of an instruction). The x86-64 implementation is a bit more complex since it requires a technique known as “instruction bundling” to solve this issue [2, 3].[1]: Certain loads/stores can’t be encoded with the addressing mode used above. In those cases, a separate reserved register is used as the target, and the verifier ensures that this reserved register is only ever loaded with values computed as x21+wN (64-bit plus 32-bit). Similarly, the stack pointer can only be loaded with sandbox addresses, so accesses to the stack don’t have to be rewritten.
Thanks for the links, definitely curious on how you dealt with the addressing modes in x86
I’ve been toying with the idea of embeddable workloads on nommu linux + usermode linux, which has interesting characteristics, but a shared address space is one of the main problems. Something like LFI will be interesting to research.
On x86-64 you can get a similar effect (64-bit base+32-bit offset) by using a segment register, for example
%gs:(%edi). Often you can get away with just adding a gs segment selector and switching registers to 32-bit versions, but sometimes you have to rewrite into an lea followed by a simple%gs:(%edi)style access.I think combining this with usermode virtualization approaches like UML would be very cool. Last time I looked at UML though it didn’t support SMP, which was disappointing. NoMMU+UML sounds like Nabla Linux – not sure if it still works though.
Heroic. However, I’m surprised about the failure to build on recent macOS. That seems like a bug someone would have noticed.
When in doubt, go check the nixpkgs build status. macOS perl builds for 5.40 seem to be doing fine
This reminds me of a suspicion I have that Rust’s downfall will be WASM. The reason being, we will all outsource safety to the environment instead of doing it at the language level.
I like rust more for correctness than for safety. It’s more important to me for my program to fail less often, and lots of rust features help me do that more than many other languages, while having a large-ish ecosystem. Safety is great, but if I replaced rust with some sandboxed language that still crashes often due to mistakes I make, I wouldn’t be interested.
Wasm doesn’t protect you from memory safety bugs, it just limits what an attacker can achieve with them. There’s still a lot that can go wrong though.
Rust works pretty well on both sides of the WASM sandbox boundary. As a sandbox implementation it can compete directly with C and C++ in both speed and portability while providing more tools to verify correctness, and Rust when compiled to WASM produces very small binaries that don’t need special shim code or platform support (compare with, for example, Go).
WASM competes with sandboxing strategies such as seccomp, and at the margins can replace uses of Firecracker or gVisor (when those are being used to sandbox a single process) with infinitely better portability.
Nobody should write Perl in 2025
It’s a nice language, I find it fun to write and it is immensely useful for some text-related tasks.
We still do :-)
Who’s “we”? Just curious.
FinTech company with >600,000 lines of Perl code.
Certain Dutch company related to hotels. :^)
Oh still? I know it was like that 20 years ago, but I’m surprised it still is big there.