I was kind of horrified by the idea that this Brand New Fresh Start was going to get saddled with support for legacy UTF-16 strings, but I wasn’t aware of quite how batshit the situation with JS today is: apparently strings in that language are not even required to be well-formed UTF-16. Yikes. https://simonsapin.github.io/wtf-8/
But I do like the idea of “err on the side of nothing rather than the wrong thing” mentioned a little further down; if there’s anything we’ve learned from browsers from the past few decades it’s that mistakes are forever.
It’s the norm for “unicode-ready” languages which either date or trace their ancestry back to the early 90s: Unicode 1.0 was released, it was full of promises, people used and exposed USVs directly, and it took 5 years for Unicode 2.0 to hit like an asteroid with the news that 16 bits wasn’t enough after all, and every system which had standardised on 16 bit characters was broken.
Casualties of this are Java and Javascript (following in its path), Windows NT, C# because Windows (and probably also Java), probably tons of others, these languages exposed USVs directly so when they “migrated” to UTF-16 that remained, and they couldn’t just hide the surrogates, or assume every surrogate was paired (as that wasn’t the case previously).
Then there’s the truly brain-dead like MySQL, who made up a 16-bits (BMP-only) UTF-8, only introduced a fixed version in 2010 (utf8mb4 in 5.5.3), and I think it still is not the default as of 8.0.
First off, a disclaimer: I know little about either WASM or Scheme; my axe is C++. But what I noticed in reading this is how the optimistic premise “WebAssembly is within 10% of native performance!” is eventually tempered by multiple compromises that need to be made:
embedded hash code in every heap object,
extra type-tag checks,
passing all parameters in global variables(!) that have to be copied to/from registers,
copying part of the stack to the heap on every call to call-with-prompt (which I gather is similar to C++ try)
I’m not sure I understand the “tailifying” transformation, but it sounds like it involves also copying part of the stack to the heap any time a function makes a non-tail call.
This all smells very expensive! And the smell gets stronger farther down the list. Number 3 reminds me of 32-bit x86 calling conventions, whereas modern CPU ABIs put parameters in registers to avoid expensive main-memory accesses. Number 4 is like the setjmp slowdown in the original C++ exception implementations, which were a big performance it and were quickly replaced by “zero-overhead” exceptions. Number 5 sounds like number 4 but makes most regular function calls slower, not just continuations.
I’m not saying this’ll make Scheme-on-WASM unusable, but I can’t imagine it’ll be anywhere near the performance of a regular Scheme implemented in native code.
(Of course I am not speaking as any sort of expert — for ex, it’s possible that sufficiently-smart WASM runtimes could detect compromise #3 and optimize it away? And maybe for the kind of stuff people want to do with this, computational performance isn’t a roadblock…)
I think the big advantage of WASM is that the instruction set maps well to native code, and so it can be JITed more trivially. There’s not a lot of weird virtual instructions that are hard to translate to assembly. It’s just standard machine operations.
It’s also that you can AOT-compile wasm and the sandboxing survives (or should), so e.g. C -> wasm -> native ought be completely safe (barring toolchain bugs), and assuming an optimising backends it should be within reasonable distance of compiling C directly.
Things get more complicated when WASI gets introduced, of course.
All of the points you are horrified about are about Scheme, not WebAssembly. This is what a proper Scheme implementation must look like under the hood anywhere, since Scheme is a high-level language with powerful, but abstract constructs. C++ intuitions are unhelpful and misleading here.
call-with-prompt is a very powerful low-level primitive in Scheme and yes, it is supposed to copy slices of stack regardless of the underlying machine (unless it’s an exotic Scheme-tailored thing).
The CPS (continuation-passing style) transformation is a useful generic technique for compiling functional programming languages, akin to the SSA form for intermediate representations of imperative languages. It does not have to rely on copying stack, it’s supposed to avoid unnecessary use of it.
The things that I listed aren’t problems with either WASM or Scheme alone, rather impedance mismatches that will hurt performance of Scheme-on-WASM. I know scheme needs those features; but they’re a lot cheaper in native code.
Thanks to this blog I finally understood the GC difficulties people always mention.
I am very excited for when we can get good support for suspension.
I was following the stack working group which is basically going to help with stuff like call/cc or generaly “async-y code that has a sync interface” (think blocking IO in Python). Looks like the last 2 meetings were cancelled due to lack of an agenda which… kinda worries me but fingers crossed!
Yeah one interesting thing is the impossibility of stack scanning with a Harvard architecture!! (separate code and data space, as opposed to Von Neumann unified architecture) He didn’t use that term in the post, but I remember it being used in the WASM docs.
I’m not a fan of imprecise scanning, in C or C++ at least. So I guess that whole technique is out the window in WASM, and you need the compiler to determine roots, not determine them at runtime.
I was kind of horrified by the idea that this Brand New Fresh Start was going to get saddled with support for legacy UTF-16 strings, but I wasn’t aware of quite how batshit the situation with JS today is: apparently strings in that language are not even required to be well-formed UTF-16. Yikes. https://simonsapin.github.io/wtf-8/
But I do like the idea of “err on the side of nothing rather than the wrong thing” mentioned a little further down; if there’s anything we’ve learned from browsers from the past few decades it’s that mistakes are forever.
It’s the norm for “unicode-ready” languages which either date or trace their ancestry back to the early 90s: Unicode 1.0 was released, it was full of promises, people used and exposed USVs directly, and it took 5 years for Unicode 2.0 to hit like an asteroid with the news that 16 bits wasn’t enough after all, and every system which had standardised on 16 bit characters was broken.
Casualties of this are Java and Javascript (following in its path), Windows NT, C# because Windows (and probably also Java), probably tons of others, these languages exposed USVs directly so when they “migrated” to UTF-16 that remained, and they couldn’t just hide the surrogates, or assume every surrogate was paired (as that wasn’t the case previously).
Then there’s the truly brain-dead like MySQL, who made up a 16-bits (BMP-only) UTF-8, only introduced a fixed version in 2010 (utf8mb4 in 5.5.3), and I think it still is not the default as of 8.0.
First off, a disclaimer: I know little about either WASM or Scheme; my axe is C++. But what I noticed in reading this is how the optimistic premise “WebAssembly is within 10% of native performance!” is eventually tempered by multiple compromises that need to be made:
call-with-prompt
(which I gather is similar to C++try
)This all smells very expensive! And the smell gets stronger farther down the list. Number 3 reminds me of 32-bit x86 calling conventions, whereas modern CPU ABIs put parameters in registers to avoid expensive main-memory accesses. Number 4 is like the setjmp slowdown in the original C++ exception implementations, which were a big performance it and were quickly replaced by “zero-overhead” exceptions. Number 5 sounds like number 4 but makes most regular function calls slower, not just continuations.
I’m not saying this’ll make Scheme-on-WASM unusable, but I can’t imagine it’ll be anywhere near the performance of a regular Scheme implemented in native code.
(Of course I am not speaking as any sort of expert — for ex, it’s possible that sufficiently-smart WASM runtimes could detect compromise #3 and optimize it away? And maybe for the kind of stuff people want to do with this, computational performance isn’t a roadblock…)
I think the big advantage of WASM is that the instruction set maps well to native code, and so it can be JITed more trivially. There’s not a lot of weird virtual instructions that are hard to translate to assembly. It’s just standard machine operations.
It’s also that you can AOT-compile wasm and the sandboxing survives (or should), so e.g. C -> wasm -> native ought be completely safe (barring toolchain bugs), and assuming an optimising backends it should be within reasonable distance of compiling C directly.
Things get more complicated when WASI gets introduced, of course.
All of the points you are horrified about are about Scheme, not WebAssembly. This is what a proper Scheme implementation must look like under the hood anywhere, since Scheme is a high-level language with powerful, but abstract constructs. C++ intuitions are unhelpful and misleading here.
call-with-prompt
is a very powerful low-level primitive in Scheme and yes, it is supposed to copy slices of stack regardless of the underlying machine (unless it’s an exotic Scheme-tailored thing).The CPS (continuation-passing style) transformation is a useful generic technique for compiling functional programming languages, akin to the SSA form for intermediate representations of imperative languages. It does not have to rely on copying stack, it’s supposed to avoid unnecessary use of it.
The things that I listed aren’t problems with either WASM or Scheme alone, rather impedance mismatches that will hurt performance of Scheme-on-WASM. I know scheme needs those features; but they’re a lot cheaper in native code.
Thanks to this blog I finally understood the GC difficulties people always mention.
I am very excited for when we can get good support for suspension.
I was following the stack working group which is basically going to help with stuff like
call/cc
or generaly “async-y code that has a sync interface” (think blocking IO in Python). Looks like the last 2 meetings were cancelled due to lack of an agenda which… kinda worries me but fingers crossed!Yeah one interesting thing is the impossibility of stack scanning with a Harvard architecture!! (separate code and data space, as opposed to Von Neumann unified architecture) He didn’t use that term in the post, but I remember it being used in the WASM docs.
I’m not a fan of imprecise scanning, in C or C++ at least. So I guess that whole technique is out the window in WASM, and you need the compiler to determine roots, not determine them at runtime.