1. 10
    1. 3

      I’m a bit confused as to what problem this is solving, as it seems quite overcomplicated for what it does, and this is not ever really explained. If I wanted to be able to intern concurrently, I would use a concurrent hash table (properly concurrent, not just striping locks), and do a straightforward sequence of: try lookup; on miss, allocate a new key and try to insert; on insert failure (raced insertion—should be rare), throw out the allocation and use the entry in the table. Very simple, wait-free, pretty fast, and has locality as good as your memory allocator.

      1. 3

        No particular problem, just “if I were to design a data structure of roughly this shape, how would it look?”! Just using a concurrent hash map would work! The difference would be using pointers, rather than indexes, so values would be larger, it would be harder to serialize the thing to disk, and site tables would have to be hashmaps rather than just arrays.

        I am curious about “properly concurrent hash map” though — what is the state of the art here? Is there some obviously superior design that everyone is using? From my experience, most of the time when I crack open a concurrent hash map, it is an array of mutex-protected shards, e.g. https://docs.rs/dashmap/latest/src/dashmap/lib.rs.html#86-90.

        1. 2

          site tables would have to be hashmaps rather than just arrays

          What is a site table?

          what is the state of the art here?

          NonBlockingHashMap. There is also a video describing its design.

          Most of the complexity comes from 1) the lack of double-word cas, and 2) the desire for lock-free resizing. Both of which are useful properties, but if you are willing to go without both, a very straightforward implementation is possible.

          1. 1

            Typo! A side-table – if I want to associate some extra bit of data with the value without adding it directly to the value, I can store it in a hashmap, if values are pointers, or in an array, if values are indices.

            1. 1

              Ah, yes. In common lisp, I would use stealth mixins to add the data to the value a posteriori, but in most languages you do not have such recourse.

    2. [Comment removed by author]