1. 25
  1.  

    1. 2

      While there’s no denying that eliminating allocations and inlining can bring speedup, and there might be something here worth exploring I can’t help but notice that dumping your memory as is at very least is not very interchaneable/portable, so probably most general purpose software should not be doing it. But in something like a compiler… maybe? Especially for things that are caches/rebuildable. It also seems extremely fragile to any corruption so I would probably add some checksum and version to the whole thing.

      In the string list is the list single allocation that keeps growing, or it needs to be preallocated? I think it’s a growable buffer? (but I’m not versed in Zig syntax, so might be wrong). I would definitely not use 0s for termination, especially given that I already abandoned idea of dumping whole thing as is as a serialization mechanism. But using single concatenated allocation and referring to strings using via (index, len) seems worthwhile.

      Overall, I think this style might be somewhat burdensome to work with for a general purpose software, but in some specialized software where squeezing last bits of performance might be important … why not?

      1. 2

        That’s really nice. I often compare this with Apache Arrow serialization; the answers to the questions are the same as there.

        I sometimes wonder how far we can take this as a programming paradigm. I’m imagining a compiler which automatically employs DOD during compilation. I’m thinking of a pattern where each object on the stack (and, recursively, the nested fields of structs, pointers to arrays, etc) being transformed into some index into some array. Recursive (non-tail call) functions could store their stack data in these arrays. At any point you could serialize your program state to a value like discussed in the video, or Arrow format, or whatever.

        I’d love to see more about the garbage collection angle. One thing I’ve been meaning to investigate is a generalized LSM tree to make random insertion/deletion into these DOD data structures more convenient. (Compaction might not be great for real-time apps so that’s a difficulty, but it’s a difficulty analogous to tracing GC pauses).

        A different take on the same topic I have seen described by e.g. Richard Feldman is the difference in memory safety between Rust and Zig reduces when you are using this pattern, or at the very least spatial safety becomes easier to get correct (especially as most the indexing could either be automated by a compiler or done programmatically at comptime like in Andrew’s examples). I haven’t really thought through temporal safety of the individual pieces of data but I am imagining we could give the handles lifetimes? Editing data in a multithread-safe way is another interesting topic.

        1. 3

          I sometimes wonder how far we can take this as a programming paradigm. I’m imagining a compiler which automatically employs DOD during compilation.

          I’d love to see more about the garbage collection angle

          If you have GC, then the model becomes similar to … Java and JavaScript!

          • HotSpot JVM has had an option for compressed pointers (32-bit integers) since 2011 or so:

          https://stackoverflow.com/questions/25120546/trick-behind-jvms-compressed-oops

          https://wiki.openjdk.org/display/HotSpot/CompressedOops

          https://shipilev.net/jvm/anatomy-quarks/23-compressed-references/


          These links are on my wiki page: https://github.com/oils-for-unix/oils/wiki/Compact-AST-Representation

          (There is also a link to the Gibbon compiler, which is more research-y and restrictive than Java or JS - https://iu-parfunc.github.io/gibbon/ but in exchange tree transformations become linear scans of memory)


          The main difference is that neither Java nor JS has value types (which Go and C# have). So you don’t have the same control, but you also can just write “normal code” (not juggling integer indices).

          And it is kind of hidden in a lot of abstraction and other costs (e.g. starting the JVM, starting v8, etc.).

          But conceptually it can be done at the language level in a straightforward way.

          A copying/Cheney GC is a global arena for a program, but you get memory safety in exchange for GC runtime cost, tracing metadata in each object, and 2x memory overhead (to copy back and forth). I think all common implementations of JS and Java use a copying collector for the young generation.


          It would be nice to see a language that has value types, compressed pointers without juggling integer indices (i.e. type safety), and is memory safe! As far as I know, that doesn’t exist.

          Actually I think that would be a good target for say https://tinygo.org/ You don’t need really need a new language, but you would have a new implementation, and a new or forked GC.

          Of course the reason the main Go compiler doesn’t do this is because many users want to use more than 4 GiB of address space. And that’s the reason it won’t be the default on JVM either.

          1. 2

            It also occurs to me that if we want BOTH compressed pointers and more than 4 GiB of address space … then now we’ve re-invented the memory model of MS-DOS :-) It had 16-bit pointers, and 32-bit pointers

            https://devblogs.microsoft.com/oldnewthing/20200728-00/?p=104012

            I never used them, but supposedly NEAR and FAR pointers were cursed by programmers. Probably because they were kind of a hacky extension on top of C, without any type safety.

            So now I propose that Go or Zig should have NEAR and FAR pointers :-)

            MS-DOS had an additional memory model known as Tiny, in which the code and data were all combined into a single 64KB segment. This was the memory model required by programs that ended with the .COM extension, and it existed for backward compatibility with CP/M. CP/M ran on the 8080 processor which supported a maximum of 64KB of memory.

            So “Tiny” mode and .COM files were basically like the WASM memory model – you are limited to either 2^16 or 2^31 (2 GiB) of address space, which is about half of typical machines at the time

            1. 1

              Actually I noticed a new paper (2024) by the authors of Gibbon:

              Garbage Collection for Mostly Serialized Heaps

              https://iu-parfunc.github.io/gibbon/public/ismm24.pdf

              We propose a new memory manage- ment strategy for language runtimes with mostly serialized heaps. It uses a hybrid, generational collector, where regions are bump-allocated into the young generation and objects are bump-allocated within those regions. Minor collections copy data into larger regions in the old generation, compact- ing it further. The old generation uses region-level reference counting. The resulting system maintains high performance for data traversal programs, while significantly improving performance on other kinds of allocation patterns.

              Except that I think this is a functional language, which doesn’t have value types. So for people who want control, you probably want something more Go-like, with both values and references.

          2. 2

            Question for Andrew: how do you make this style work in the presence of multithreading? The advantage with having each object be its own memory allocation and accessed through its own pointer is that you can modify it atomically with a copy + atomic pointer swap, and there’s tricks one can do to do fast memory reclamation without going all the way to a GC. This is kinda lost when you coalesce all the objects into a single memory allocation, but maybe there’s another way to handle the problem?

            1. 3

              Non-cursed answer: you don’t

              cursed answer: click if you dare

            2. 1

              I don’t understand how this is really any different than typical serialization.The recursion depth is only 1 deep, but that’s enough for you to still need to write the code that reflects all the fields and follows the pointers. He says you only need one writev call, but that would be true even with an unlimited number of pointers, the whole point of writev is that it can do an arbitrary amount of work. He says it makes it so the on disk format can be exactly the same, but then admits it actually can’t be because of the pointers for the arrays and hash tables, and if your serialization already has to be smart enough to handle that it’s only a tiny bit more work to handle to arbitrary depth. Also how do you do a hash table where the values are variable length arrays? If you try to express it as indexes you need a variable length array of variable length arrays, and then you can’t reduce that to arrays of indexes, somewhere you need an array of pointers. If in this scheme you’re allowed to nest hash tables and dynamic length arrays then you are long the problem of handling arbitrary depth and you’re back at square one. Am I missing something here?

              1. 1

                that would be true even with an unlimited number of pointers

                Maybe useful context is that this is how the incremental parts of the zig compiler is being written. It only needs to write/read O(1) chunks of memory to save/load the entire state of the compiler, whereas if it used internal pointers it would be O(amount of code being compiled).

                There’s a real example here https://github.com/ziglang/zig/blob/master/src/link/Wasm.zig#L50-L305

                that would be true even with an unlimited number of pointers

                Zig hashmaps can have custom hash/comparison functions. So you can have one big array of data, a ‘slice’ type which is a offset+length into that array, and a hashmap that knows how to look up the data for those ‘slices’.

                1. 1

                  But they allow hashmaps and variable length arrays so it’s actually still O(n) ? Those all point to separate blocks of memory do they not?

                  1. 1

                    The number of hashmaps and variable length arrays is constant, and does not increase as you compile more code

                    1. 1

                      They’re not as soon as you have HashMap<K, Vec<T>> or Vec<Vec<T>>, no?

                      1. 1

                        HashMap<K, Vec> would become (HashMap<K, OffsetAndLen>, Vec) as I explained above, giving a constant 2 regions of memory. Just like the string examples in the talk.

                        1. 1

                          That’s not equivalent, you can no longer resize one Vec independently of the others. That only works if their length changes rarely.