1. 18
  1. 6

    Semi-space collectors are often used for young generation collection, where the header contains a small count of the number of times that an object has survived collection and promotes it to an older generation when possible. This lets you do very fast allocation of small objects (normally you allocate very large things from another space) and keep them in cache.

    It’s also useful on embedded systems where space is more important than speed. You can separate the collect into a mark phase and a copy phase and either allocate chunks and copy and deallocate chunks incrementally. This approach then becomes very like a mark-and-compact approach: the to and from spaces are the same and you just scan the live object and shuffle them back to the start of the space.

    In languages that don’t have mutable data structures, you get an extra invariant for free: all pointers in your heap point from new objects to old ones. This lets you do a backwards scan and compact as a single pass, so you copy objects to the end of the space and the to space is just the from space in the opposite order.

    1. 2

      In languages that don’t have mutable data structures, you get an extra invariant for free: all pointers in your heap point from new objects to old ones. This lets you do a backwards scan and compact as a single pass, so you copy objects to the end of the space and the to space is just the from space in the opposite order.

      Also nice for generational gc, for the same reason. But this must be balanced against opportunistic implementation-level mutation, and I think the latter wins.

      (Another thing which has been proposed, e.g. by doligez and leroy, is to take advantage of the distinction of immutable objects from mutable ones, if it exists, to optimise collection for objects in the former category. This may place somewhat more of a burden on the programmer, but may make for a nice balance.)

      1. 3

        Also nice for generational gc, for the same reason. But this must be balanced against opportunistic implementation-level mutation, and I think the latter wins.

        I also have that intuition, but I’ve heard counter arguments from the folks working on the OCaml GC and I don’t have enough evidence to contradict them. They did have a few anecdotal examples of rewriting software using ref cells to use immutable data structures and getting a speedup from improved GC performance. With small objects and a local bump allocator, the cost of creating a copy is comparable to the cost of mutation (both are likely to be dominated by cache misses) and so the question is whether doing more GC work with a faster GC is better than doing less GC work with a slower one.

        I don’t know how Erlang does it these days, I think they do plain reference counting (which is also safe in pure-immutable languages because you’re guaranteed not to have cycles). For Verona, we have a model where you can take a region (containing an arbitrary object graph) and freeze it to create an immutable object graph. We can do a single-pass SCC and refcount construction on that and then refcount the individual sub-graphs, rather than needing per-object refcounts, which means that we can do a lot of refcount elision (as long as you hold a reference on an object, you don’t need to manipulate any refcounts on reachable objects, unless you want to hold them for a long time and allow the other objects to be deallocated).

        Another thing which has been proposed, e.g. by doligez and leroy, is to take advantage of the distinction of immutable objects from mutable ones, if it exists, to optimise collection for objects in the former category. This may place somewhat more of a burden on the programmer, but may make for a nice balance.

        I believe OCaml was planning on doing something like this, with a separate strategy for things with ref cells. The difficulty is that mutable objects can point to immutable objects, so the throughput of your immutable GC is bounded by the throughput of your mutable GC, which may well offset the wins.

        One of our goals with Verona is to allow multiple memory management models. Each Verona region has strong non-interference guarantees with the rest of the system and so we can use arena allocation with bulk free, reference counting (with cycles leaked, with cycle detection, with saturation or with large refcounts), or tracing on a per-data-structure basis. Again, it’s not clear to me what the programmer burden will be here. I suspect that we’ll default to something like reference counting and let people opt into tracing if they want cycles or opt into arena allocation if they know the lifetimes of everything in a region are about the same.

        1. 2

          the question is whether doing more GC work with a faster GC is better than doing less GC work with a slower one.

          I think that what ‘better’ is obscuring (or revealing?) here is this is also a question of values. I have a feeling—but one which grows increasingly weak over time—that you should be able to avoid gc overhead by avoiding the gc. But perhaps that’s bogus (especially for large applications).

          (Hardware barriers would also be really helpful here. Just saying…)

          I don’t know how Erlang does it these days, I think they do plain reference counting

          I could swear they had some sort of tracing gc. Plain rc may be safe, but it’s still slow.

          I found this link which suggests semispace.

          The difficulty is that mutable objects can point to immutable objects, so the throughput of your immutable GC is bounded by the throughput of your mutable GC, which may well offset the wins.

          Good point—the thing doligez and leroy take advantage of is the ability to move immutable objects without synchronising with the mutator. But if you wanted to take advantage of pointers only going in one direction, you’re right that immutability wouldn’t be much help.

          it’s not clear to me what the programmer burden will be here

          The simple fact of a massively segmented heap already comprises a monotonic loss in expressivity: in a shared-everything language, I can produce arbitrarily segmented object graphs at will; but in a shared-nothing language, I cannot produce arbitrarily connected object graphs. This is not an academic concern; so I hear, that objects are always copied between actors is a big problem in erlang, and users must resort to out-of-band methods for communication (and reclamation!) for performance reasons.