1. 27
  1.  

  2. 4

    A good discussion.

    Good points made:

    • reference-counting has high overhead all the time because you’re constantly incrementing and decrementing reference counts, and they’re not only memory accesses, but atomic memory accesses. To some extent the extra instructions don’t matter because modern wide OoO CPUs seldom have enough instruction-level parallelism to keep them completely busy, so things off to the side of the main computation such as reference counts and bounds checks can be close to free (except for increased code size, icache size and bandwidth pressure). But atomic memory operations, ugh.

    • reference counting can have long pauses when the last reference to a large data structure goes away, and traditional reference counting GCs don’t have any mechanism to do that incrementally, whereas tracing collectors have had a lot of work making them incremental.

    The thing that all garbage collectors have to do is tell whether a piece of memory contains a pointer because they need to traverse all the pointers and they need to not traverse the things which are not pointers. In OCaml that’s done using what’s called tagging. Each eight bytes of memory on the heap, the ones that end in a one are integers, and the ones that end in the zero are pointers.

    So it’s very easy to tell in the garbage collector, as you’re walking along, which is which. That’s not the case for all garbage collectors. In a lot of other garbage collectors, there’s a separate data structure off to one side that you have to look up with some tag from each object to work out its layout, to decide where the pointers are.

    Right. Having to go off and follow a type tag or class pointer or something to find a data structure describing where the pointers are is slow. I’ve found it’s not worth it compared to simply scanning the whole object – at least if obvious things such as buffers and the bodies of strings and arrays known to contain only unboxed integers or floats are allocated or marked in a way that means you know there are no pointers there.

    I don’t much like taking tag bits out of objects. It’s hard to avoid in a completely dynamically-typed language, but if you have some kind of type declarations then most of the time it’s easy for the compiler to know either “this is definitely an integer” (or character, or float, or boolean), or else “this is definitely not an integer or character or float or boolean”. In either case you don’t need the tag bits. Which means you can store a full machine word sized integer without losing range off it from the tag bits. That’s important for a lot of algorithms especially in graphics, cryptography etc.

    The Boehm–Demers–Weiser garbage collector performs surprisingly well (not many supposedly clever GCs actually manage to beat it), at least if you don’t use the optional struct layout description feature.

    As with the OCaml collector, you just scan every pointer-sized pointer-aligned chunk of memory in the object and try to decide if it’s a pointer. Instead of looking at the LSB the primary and first test is whether the value lies within the garbage-collected heap. That’s just a range check – a subtraction and an unsigned-branch-if-less-than using values held in local variables (registers). That’s very quick – the same speed as the AND with 0x1 and beq/bne OCaml needs. On a 64 bit CPU with current RAM sizes the chance of a random bit pattern looking like a pointer into the heap by accident are very small. Even programs that use many GB of memory in total (e.g. web browsers) have only a few (or few dozen or few hundred) megabytes of objects that are expected to contain pointers. But the address space is 17,179,869,184 GB.

    There are some further tests around whether the supposed pointer is pointing to a VM page that is actually in use for storing objects, whether it’s to an object not marked as free etc. Those fall out as a side-effect of finding the start and size of the object.

    Very occasionally a random bit pattern / integer / float will look like a pointer to an actual object AND the last real pointer to that object has disappeared and so the object will be wrongly retained for longer than it should be. This is not a problem in practice. Tracing GCs do not promise to reclaim every object as soon as it could possibly be reclaimed.

    1. 2

      if you have some kind of type declarations then most of the time it’s easy for the compiler to know either “this is definitely an integer” (or character, or float, or boolean)

      Note: OCaml (as many functional languages) has algebraic datatypes (variants, sum types), a generalization of enums where constructors may or may not carry parameters; for example a value of a binary search tree is either an empty tree Empty, or a node Node(left, val, right) with three parameters (left subtree, node value, right subtree).

      The representation of algebraic datatypes used by OCaml is a (tagged) integer for parameter-less constructors, and a pointer to a heap block for constructors with parameters. As a consequence, many types heavily used in practice (option/maybe, success-or-error results, lists, sets, maps, various abstract syntax trees…) have values that are either immediate values (represented as integers) or pointers.

      GHC Haskell represents all data constructors as pointers, but uses the low bits of the pointer to store the constructor number when it is small (< 8 I guess?). This gives a representation that is as efficient in practice when there are few constant constructors, but I would still expect it to be slower when there are many (for example, to represent the tokens parsed by a lexer, or the state of a state machine).

      1. 2

        So single threaded reference counting like Rc in rust is significantly cheaper?

        I vaguely remember having heard of a mixed approach : Counting separately in threads and ocassionally syncing. For that again you need some sort of barrier but there is less work to do…

        I wonder how such an approach compares in a typical rust program with relatively few Arcs…

        1. 2

          Reference counting in Rust is generally cheaper. You don’t have to touch the counters often, because you have an option of lending the content of Rc/Arc as a bare reference instead.

          The unpredictable cost of destruction is also solvable. It’s not even a problem unless you have a large object graph or expensive destructors. When you have an object graph that you know how to drop efficiently, you can write a custom drop implementation (e.g. it’s a good idea for refcount-based trees to use an iterative algorithm to drop their children instead of recursing).

          Arc exposes refcount and can be sent to a thread, so if all else fails, you can write a destructor that sends its execution to a queue or a thread.

          if Arc::strong_count(&x) == 1 { rayon::spawn(move || drop(x)) }
          
        2. 1

          From a user perspective, the tagged pointers were never a problem for me: my integers either fit even in 31 bits, or a full-grown bignum library like Zarith is needed.

          On a side note, the work being done on the OCaml GC in the past year got me actually interested in memory management, though I’m still far from being able to contribute to that work…

          1. 1

            reference-counting has high overhead all the time because you’re constantly incrementing and decrementing reference counts, and they’re not only memory accesses, but atomic memory accesses

            The atomics can be avoided, if you are clever. And gc has some degree of constant overhead too, with barriers. Regardless, barriers are cheaper, and pay for themselves anyway with significantly faster memory management.

            Each eight bytes of memory on the heap, the ones that end in a one are integers, and the ones that end in the zero are pointers

            :/ they messed it up. That should be backwards. Additionally: what do they do about floating-point numbers, and packed arrays of small numbers?

            if you have some kind of type declarations then most of the time it’s easy for the compiler to know either “this is definitely an integer” (or character, or float, or boolean), or else “this is definitely not an integer or character or float or boolean”. In either case you don’t need the tag bits

            Hotspot has a cute trick: all the fields of a class are reordered, so that the pointers are contiguous. However, the practicality of this depends on language semantics, and any mechanism that requires lookups will probably incur extra overhead. Applications which specifically require specific-width integers are somewhat specialised; is it worth compromising whole-application performance for them? And if performance is truly critical, such routines may be written in assembly. And: a compiler may optimize routines which do not allocate to use unboxed integers.

            The Boehm–Demers–Weiser garbage collector performs surprisingly well (not many supposedly clever GCs actually manage to beat it)

            Is that still true? Back in the day, it had a really good allocator and could be used as a fast malloc implementation; but it seems that other allocator have caught up (je, tc, mi).

            On a 64 bit CPU with current RAM sizes the chance of a random bit pattern looking like a pointer into the heap by accident are very small. Even programs that use many GB of memory in total (e.g. web browsers) have only a few (or few dozen or few hundred) megabytes of objects that are expected to contain pointers. But the address space is 17,179,869,184 GB.

            Address space is 256tb on a modern CPU. 128tb in practice. And your pointers are not distributed randomly throughout it. And: precise GC is not so much interesting in itself as because it permits copying and compacting GC.