Are there any articles that go into more detail on the memory/object model? I know where the specs are, but that’s probably too in-depth for me given that I’m only slightly familiar with WASM.
IIRC he explains very clearly how to compile to C++ (e.g. with vtables) and Java to WASM GC. And I think he mentions OCaml.
My memory might be clouded by what I’m working on, but I think it was a very simple model, and similar to Oils GC, which is also designed to be simple.
Basically you have “low level” GC structs and arrays, which may have traceable references (Ref type I think). And your language types must be mapped/compiled to those structs and arrays.
I think of it like Oils – we have a tag HeapTag::FixedSize which is a struct with a bit mask to tell the GC how to trace.
And then we have HeapTag::Scanned and HeapTag::Opaque which are arrays with a length. The Scanned type is full of pointers, and is traced. The Opaque type may be integer or character data.
namespace HeapTag {
const int Global = 0; // Don't mark or sweep.
const int Opaque = 1; // e.g. List<int>, BigStr
// Mark and sweep, but don't trace children
const int FixedSize = 2; // Consult field_mask for children
const int Scanned = 3; // Scan a contiguous range of children
So our object model is basically:
user-defined classes, which sometimes has a vtable
classes and tuples get mapped to the “struct”, and you create a bit mask that tells the runtime where the pointer fields are, so it can trace
List<int> and Str gets mapped to the non-scanned / opaque GC array
List<Str> and List<AnyObj> get mapped to the scanned GC array
Dict<Str, bool> gets mapped to 3 GC arrays – opaque ints for the dict “index”, scanned Str, and opaque bool
Again this is from memory, but I was surprised how close the model is to what we did. But obviously this is because there are only so many ways to create traceable objects for a GC. We made a GC for a subset of C++, which is not unlike what they’re targeting.
What they don’t have (currently):
interior pointers, which say Go supports. (After implementing a GC, I feel like this feature pushes a tremendous amount of complexity into the GC, but I guess it works for Go)
If you want to make a FLAT dynamically sized array of tuples of (Str*, int), then I think that doesn’t fit in the model. You have to do some compilation to transpose it into 1 scanned and 1 non-scanned GC array, or you have to accept that each (Str*, int), is a GC object, which is not ideal.
So your compiler can be non-trivial and you have to make some fidelity vs. performance tradeoffs.
To summarize, in WASM1 you have to map all your language types to i32 i64 f32 f64. In WASM2, you have map everything to GC structs and arrays. And some of the fields in the structs/arrays are Reference types, which are traceable by the runtime.
(and I appreciate any corrections because I didn’t actually work with WASM GC after watching this talk, and may have mis-remembered some things)
Then you’re limited to the subset of language features that are expressible in JavaScript. You’re stuck with JavaScript’s weird decisions about numbers, etc.
good point: it’s why i prefer Rust and dotnet over the jvm…in a huge data set a few years back, the Java numeric tower left soooooooooo much performance unattainable.
Compiling to JavaScript would mean the runtime would have to do extra work to recover lost information – like flatten dicts into structs (dynamic members to static members). This basically works in the JITs for JS VMs, but it’s heuristic
That is, WASM GC can represent structs natively, but JS can’t
Also, parsing WASM is faster than parsing JavaScript. I think in all browsers, WASM started out as just another front end to the JIT. But it has probably gained more custom support over time.
It would be interesting but I’m not sure which direct comparisons exist
Various toolchains have support for WasmGC today, including Dart, Java (J2Wasm), Kotlin, OCaml (wasm_of_ocaml), and Scheme (Hoot).
i.e. which ones compiled to WASM 1? e.g. I think Dart compiled to JS, and there was never really a reason to compile Dart to WASM 1 and ship a GC with it
Likewise I am not sure anyone was interested in running Java in WASM 1 … WASM 1 is really more appropriate for C/C++/Rust
The article mentioned possible upsides for even non GC languages. Although the main issue there is LLVM.
I wonder if we’ll one day see a wasm GC demo with Rust, which we could use as an interesting comparison.
Are there any articles that go into more detail on the memory/object model? I know where the specs are, but that’s probably too in-depth for me given that I’m only slightly familiar with WASM.
The best resource I’ve found is Rossberg’s talk here:
https://www.youtube.com/watch?v=fMGuQXNqlaE&t=3716s&ab_channel=ACMSIGPLAN
which I quoted here: https://lobste.rs/s/djxada/missing_point_webassembly#c_w6foi5
IIRC he explains very clearly how to compile to C++ (e.g. with vtables) and Java to WASM GC. And I think he mentions OCaml.
My memory might be clouded by what I’m working on, but I think it was a very simple model, and similar to Oils GC, which is also designed to be simple.
Basically you have “low level” GC structs and arrays, which may have traceable references (
Reftype I think). And your language types must be mapped/compiled to those structs and arrays.I think of it like Oils – we have a tag
HeapTag::FixedSizewhich is a struct with a bit mask to tell the GC how to trace.And then we have
HeapTag::ScannedandHeapTag::Opaquewhich are arrays with a length. The Scanned type is full of pointers, and is traced. The Opaque type may be integer or character data.https://github.com/oilshell/oil/blob/master/mycpp/gc_obj.h#L8
So our object model is basically:
List<T>Dict<K, V>Tuple2<A, B>,Tuple3<A, B, C>, …following statically typed Python - https://www.oilshell.org/blog/2022/05/gc-heap.html
In that case
List<int>andStrgets mapped to the non-scanned / opaque GC arrayList<Str>andList<AnyObj>get mapped to the scanned GC arrayDict<Str, bool>gets mapped to 3 GC arrays – opaque ints for the dict “index”, scanned Str, and opaque boolAgain this is from memory, but I was surprised how close the model is to what we did. But obviously this is because there are only so many ways to create traceable objects for a GC. We made a GC for a subset of C++, which is not unlike what they’re targeting.
What they don’t have (currently):
(Str*, int), then I think that doesn’t fit in the model. You have to do some compilation to transpose it into 1 scanned and 1 non-scanned GC array, or you have to accept that each(Str*, int), is a GC object, which is not ideal.So your compiler can be non-trivial and you have to make some fidelity vs. performance tradeoffs.
To summarize, in WASM1 you have to map all your language types to
i32 i64 f32 f64. In WASM2, you have map everything to GC structs and arrays. And some of the fields in the structs/arrays are Reference types, which are traceable by the runtime.(and I appreciate any corrections because I didn’t actually work with WASM GC after watching this talk, and may have mis-remembered some things)
Why not skip wasm and translate programs in the garbage collected language to garbage collected javascript, and let v8 take over?
Then you’re limited to the subset of language features that are expressible in JavaScript. You’re stuck with JavaScript’s weird decisions about numbers, etc.
good point: it’s why i prefer Rust and dotnet over the jvm…in a huge data set a few years back, the Java numeric tower left soooooooooo much performance unattainable.
Compiling to JavaScript would mean the runtime would have to do extra work to recover lost information – like flatten dicts into structs (dynamic members to static members). This basically works in the JITs for JS VMs, but it’s heuristic
That is, WASM GC can represent structs natively, but JS can’t
Also, parsing WASM is faster than parsing JavaScript. I think in all browsers, WASM started out as just another front end to the JIT. But it has probably gained more custom support over time.
I wish they’d included a comparison of performance using compilation to WasmMVP vs. WasmGC.
It would be interesting but I’m not sure which direct comparisons exist
i.e. which ones compiled to WASM 1? e.g. I think Dart compiled to JS, and there was never really a reason to compile Dart to WASM 1 and ship a GC with it
Likewise I am not sure anyone was interested in running Java in WASM 1 … WASM 1 is really more appropriate for C/C++/Rust
The article mentioned possible upsides for even non GC languages. Although the main issue there is LLVM. I wonder if we’ll one day see a wasm GC demo with Rust, which we could use as an interesting comparison.