1. 9

  2. 6

    Yeah, I’ve been thinking about this a lot lately, after watching one of the first streams of the hand made hero. I don’t have any well-defunded conclusions, but let me just brain dump here: perhaps it’ll be interesting, and it might help me clarify my own thinking.

    So one thing is that people often say “of course you do this stuff in games, cause games are architecturally simple”. I haven’t written a single real game in my life, but, just abstractly thinking about this, it seems that games are, in fact, quite complex. They are very state full, and state is the worst. In a game, you have some long lived asserts (which might get loaded and unloaded dynamically), you got a game loop, which churns through some complex computation 60 times a second, and you have entities, which span many frames, but are created and destroyed. It seems that, eg, a typical HTTP backend is actually quite a bit simpler, as it contains only “stateless request” bits.

    Then, it does seem like there’s manual memory management and manual memory management. Casey’s “ok, we are going to mmap a single continuous gig of memory, that’s going to be our sole allocation for the entire game, and we’ll subdivide it manually” is very different for C++/Rust style RAII, which is a compiler assisted way to pair mallocs and frees. “not using gc” doesn’t mean that you manage memory efficiently.

    It’s also kind of painful to think about how RAII code looks. Especially if you have unwinding. Basically anywhere in the program you’d have branches like “ok, and if we are unfortunate here, let’s go drop this structure, but also let’s walk it recursively and drop its every child”. Compare this with a hypothetical request/response thing which passes around a scratch space (bump allocator). As you free all the stuff in bunch at the end of the request, nothing needs drop glue!

    If we speak about Rust, it does seem that the language is biased towards owned data. Something like Vec<(String, String)> is trivial to do in Rust. This is exactly the pattern with a pile of individually allocated RAII objects, which actually can live and die together. In C, it would be hard to code dynamic array of dynamic strings. On the other hand, if the data is read only, it would be naturally to allocate a single chunk of memory, and stuff it with strings and interior pointers. Doing that in Rust would be challenging. Corollary (?) there are few Rust libraries that need to allocate, but let the caller manage memory: https://internals.rust-lang.org/t/why-bring-your-own-allocator-apis-are-not-more-popular-in-rust/14494.

    Finally, to speak about something I actually do, rust-analyzer is rather game-like. There’s a bunch of “data”, there’s a main loop accepting input from the client, doing “game update” and “rendering” the end result (completions, highlighting and such) to the client. It pains me that today this works by having millions individually allocated things pointing at each other. I envy sorbet’s (type checker for Ruby) architecture, where they have a GlobalState, which holds a bunch of flat vectors, and that’s basically it. Sadly, doing just that in rust-analyzer is not trivial, as we have a pretty intricate incremental computation setup, which, in the current incantation, pretty much forces a tree of individually owned objects.

    1. 1

      Sadly, doing just that in rust-analyzer is not trivial, as we have a pretty intricate incremental computation setup, which, in the current incantation, pretty much forces a tree of individually owned objects.

      Yeah, I’d really like a way of doing incremental computation in Rust which could let you BYO context. I could be wrong, but IIUC Salsa depends on global state and a global allocator at the moment? I’ll have to investigate how Sorbet does it - it’s been on my TODO list!

      Edit: Found it!

      1. 2

        Salsa doesn’t depend on global state, but it does use hash maps (with global allocators) a bunch. What I am getting that though is slightly different: the structure of the computation doesn’t yield itself nicely to flat representations.

        Let’s say you are writing a declaration collection code, which walks the module and records which functions it defines. In a non-incremental compiler, you’d write this in a “push” style. First you push a module to the vec of modules, then you’d push a bunch of functions, then you’d set relations between the two (module will contain hash map with ids of functions, funcs will contain parent module id). For three modules, there’re one Vec of modules, and one Vec of functions (or even one Vec of symbols at all, if you go weakly-typed, like sorbet).

        In salsa, you write this in a pull style —there’s a query which computes module’s function. This query returns a module object which contains functions. So, for three modules, you’ll end up with three separate vecs of functions.

        That’s also an architecture-level issue. In Sorbet, “what are relations between the entities” is independent of the storage location. In salsa, logical relations dictate storage, so you’d have to use graph database like access path to refer to things.

        1. 1

          Thanks for the additional information and clarification! Definitely can see the issue of queries having to do lots of individual allocations.

          Do you think the non-incremental approach is a viable path for interactive development?

          1. 2

            That depends on the language. Ideally, the language designed such that non-general low-tech incrementality is enough (approach 1 from https://rust-analyzer.github.io/blog/2020/07/20/three-architectures-for-responsive-ide.html).

            1. 1

              Neat - that’s a helpful post, lots of food for thought!

              The language I’m making is dependently typed, with a first class module system (which are actually just dependent records). I’m guessing this would probably make doing some of the low-tech approaches more challenging, as I need to run an interpreter interleaved with the type checker. I was also considering adding stuck macros like in Klister as well - where macro expansions can be delayed until more information is available. My one saving grace will probably be that the modules form a DAG, so I can figure out the dependencies between everything in parallel before hand? But yeah, I’ll continue to ponder this to see if there are other ways I can simplify things…

    2. 5

      I mostly see this grouped-elements design come up in game programming, or I guess other graphics-intensive domains where they have to deal with millions of polygons / sprites / physics-objects, which are small and fixed-size and whose lifetimes are “clumped” in big groups. It makes total sense there.

      I’m far from convinced it’s a higher or more advanced level of architecture, or that individual-entity code is “stupid” as this guy literally says. I totally believe it is in his domain of expertise but I’d need to be convinced that it’s better for GUI app development or database implementation or audio coding or whatever.

      Here’s one subdomain where I am a total convert to grouped-elements thinking: JSON (or equivalent) parsing. I know from grim experience that the most expensive part of the usual JSON workflow is not the actual parsing but allocating the tree of string/array/map objects (the DOM). I found that going to a binary-JSON format like BSON didn’t help much. What does help is not creating the objects in the first place. The optimal format is one that’s already laid out like an object tree, so you basically don’t need to parse it ahead of time. You just hand out inner pointers into the data. FlexBuffers is one incarnation of this, and Amazon has a similar one whose name I forget. I made my own called Fleece before I’d seen these (it may predate them.) When you have a database query that has to look inside jillions of JSON-equivalent objects, this is a huge win.

      1. 2

        I linked with YouTube’s timestamp URL, but if that doesn’t work for you, go to 1:13:27

        1. 2

          The point is made somewhat ham-handedly, and with an undue focus on performance, but I think there is something worthwhile here.

          In particular, I think that this is what abstraction is. It’s the ability to free yourself from having to think at an (architecturally) low level about operations performed on individual objects, and instead orchestrate, at a high level, the interactions of those objects.

          1. 1

            Which of these does SQL encourage? How about APL? This entire dichotomy only applies for the special case of game developers using C++, I think. Having said that, there is insight in the idea that an API can either do one thing at a time, or do one batch of many things at a time.