1. 5
    1. 1

      One or two people should be handle it — it’s not a huge industrial project.

      I think you’re missing a word there.

      For example the leaf Token type can be used a variant of both word_t and expr_t.

      And there.

      1. 1

        Thanks, fixed!

        Aside: I just wrote a little Markdown/HTML spell checker with Python/shell after trying some canned Unix tools, which was super useful on the docs. Wondering if they would have caught these non-spelling typos :-/

        1. 3

          My low-tech technique is to read the post out loud. Normally a bunch of times after I post because I’m lazy. ;) Somehow you seem to catch missing words more easily when reading out loud than just reading silently as usual.

    2. 1

      GC is cool. I’ve never implemented a real collector, but long been fascinated by the algorithms.

      Are copying collectors still a good idea, though? I was under the impression that all the memcpy-ing hurts performance because RAM bandwidth is a bottleneck in modern architectures.

      Also, sort of OT, I assume PyPy isn’t fast enough for your needs? Or are you doing this to avoid dependencies on a Python runtime?

      1. 4

        Are copying collectors still a good idea, though? I was under the impression that all the memcpy-ing hurts performance because RAM bandwidth is a bottleneck in modern architectures.

        It depends hugely on the language, for two reasons:

        Copying collectors’ main win for young generation GC is that they allow bump allocation. Allocating an object is two instructions: increment a pointer and branch on overflow. In contrast, snmalloc’s aggressively optimised allocation fast path is around 10-15 instructions and the deallocation is around 30. With a bump allocator, you don’t have to free at all (except for objects with finalisers, but that cost is the same irrespective of the underlying allocator), you just reset the bump pointer to the start. This means that you’re saving on the order of 40 instructions per object. In exchange for this, you pay a bit of a penalty every time that you store a pointer to a young-generation object into the old generation (card marking can make this fast) and extra cost from copying the surviving objects and updating all pointers to them.

        Copying collectors like this rely on the infant mortality hypothesis for performance. The assumption is that 90% of objects will be dead by the time the GC runs. If copying and updating pointers (or the deferred cost of having to follow fowarding pointers) costs a lot per object but you need to do it only for 10% of the objects then this can still be a win relative to allocate / free costs. For the copying to be fast, you want to size your young generation to fit in the cache. This has a direct impact on whether you meet the 90%-dead goal.

        My intuition is that this will not be the case for a shell. It will run functions that allocate large amounts of memory whose lifetime is bounded to the current shell command invocation, rather than a lot of things that last for (for example) one loop iteration. That may still make a non-generational copying GC a good idea: bump allocate everything that you need for the current shell script statement and then copy the small number of things that remain reachable to the start at the end (or if you hit some memory limit that triggers early collection).

        The second problem is interaction with other languages. C/C++ really encourage data structures that treat the address of an object as a stable identifier. The default std::map and friends with pointer types in C++ use the address of the object for identity comparisons and hashing / ordered comparisons. This means that you can’t expose raw pointers to your objects. You can expose the objects as smart pointers in C++ (as long as you’re really careful about lifetimes - anything that exposes a pointer must prevent GC from running as long as it’s on the stack) but then they will have more expensive comparisons.

      2. 2

        My understanding is that essentially all high performance collectors (JVMs, JS, and maybe Go) are generational collectors, and they all use Cheney/copying for their young / small generation.

        But yes it’s rare to see it for the whole thing, for both time and space reasons – femtolisp being an exception.

        So my hope is that shells have small enough heaps (10 MB to 100MB) where this doesn’t matter. And if that’s not true, then I think there is a path to doing a more standard generational collector. (Although my instinct is to avoid that complexity)

        Yeah I have tried PyPy and it won’t be fast enough – though memory is an even bigger issue. (The goal for Oil to use the same or fewer resources than bash – there shouldn’t be a reason NOT to use it!)

        PyPy doesn’t take advantage of static types, and doing the C++ translation saves drastic amounts of memory as well as speed. Python objects are surprisingly large.

        In general I think JIT + shell is a bad idea, which I elaborated on here:


        linked from: https://github.com/oilshell/oil/wiki/FAQ:-Why-Not-Write-Oil-in-X%3F

        Aside: If you are a C++ / performance person (somehow I got that impression) I could definitely use some help !

        I have written C++ in production, but I’m also one of those people who fled to Python in the early 2000’s :) So if you find the algorithms cool and want to work with it, that is very welcome. A big thing we haven’t even done is the Dict! It doesn’t even use hashing now!! So if anyone wants to write a garbage-collected Dict, that could be fun.

        Also an aside that took me about ~20 years to realize. C and C++ do not have true functions without GC !!! I think Rust really brought that to light (even though I don’t use it) – memory management appears at EVERY function boundary!

        So I guess I tend to think of software as breaking down things into units, which is why I was never that productive with C++. Large functions are more idiomatic in C and C++ (as espoused by Carmack and others.)

        But if you have GC then you have a lot more composability. And I think Oil’s architecture reflects that – I just added some reflection for parsing Oil in Oil (for config files), and it’s super easy when you don’t have to worry about memory management.

        1. 3

          The Go GC is non-generational and uses a mark-and-sweep collector, no copying involved.

          Go aggresively relies on escape analysis to move many allocations to the stack. The idea here is that the “short-lived” allocations that would benefit from a generational collector are instead allocated on the stack, and thus automatically freed when they go out of scope.

          Go does copy memory around, but only in the context of growing a goroutine stack [1]. Instead of getting a stack overflow, Go will allocate a bigger stack and copy into it everything from the old stack. In contrast, heap allocated is always collected in a mark-and-sweep manner.

          Minor nit for my first point: Java also does escape analysis, but my understanding is that Go has an advantage here in the fact that it is mostly a value-oriented language, while Java uses references everywhere. It’s possible that this makes escape analysis trickier.

          [1]: This is the main reason why Go forbids you from passing pointers to C code. If the object that you’re pointing to lives in the stack, growing it later means that your pointer is now potentially pointing to garbage, or even into the stack of another goroutine. Go’s solution to this is to use a handle, which stores your object on the heap, handing you back a stable pointer to it.

        2. 2

          Thanks for the detailed answers!

          I’m a C++ perf-oriented type, but shells aren’t really my thing (though I did implement a partial clone of csh in college, in Pascal, for a forgotten OS.) I work on databases nowadays, and I’m building a P2P social network in my spare time.

      3. 2

        Oh and another thing that gives me confidence in Cheney / copying is that the creators of the v8 JS engine used it for a new embedded platform:

        https://github.com/toitlang/toit/tree/master/src/third_party/dartino (see two_space_heap.cc)

        some context: https://blog.toit.io/the-toit-language-is-now-open-source-14bdcb1604d9

        They definitely know what they’re doing as far as performance and VM implementation (I remember going to a tech talk on v8 when it was being developed).

        I think there is a lot of similarity there, and common requirements between a shell and embedded systems.

        On the other hand, the author of MicroPython appears to be a monster programmer, but I looked at his implementation as well, and it seems like GC is one of the admitted weak spots.

        I definitely learned that making a language requires a lot of different skills, and it’s hard to do everything well … I have a lot more respect for say CPython now. People can always pick out something that can be done better, but doing well on every aspect is hard.

      4. 1

        If your heap lives long enough, eventually you’ll have to copy and compact, even if you’re only doing it piecemeal by moving a few things at a time rather than cloning the heap, or you’ll pay too much overhead in fragmentation.

        1. 2

          My layman understanding is that fragmentation is not that much of a problem nowadays. This go-dev thread has some discussion about why Go went with a mark-and-sweep collector instead of a copying collector (their first idea). The paper that is usually cited in this setting is Johnstone & Wilson, The Memory Fragmentation Problem: Solved?

        2. 1

          True, but doing it once in a long while isn’t a performance concern.