The Hash Array Mapped Trie uses a similar bit-indexed trie but has a further optimization, one that I guess the persistent vector doesn’t need because its interior nodes are mostly full.
In the HAMT the nodes get populated randomly (the trie index is a hash) so they’re often very sparse. This wastes a lot of room at high branching factors. So instead the node has a bitmap recording which child pointers are non-null, and the array only contains the non-null pointers, which makes it much shorter. Then there is a very clever use of popcount that lets it look up the right child with almost no overhead.
One implementation of this for Rust is in the im crate.
An idea I’d be interested to see done, but can’t imagine having the tuits for, is something kind of like it that’s mutable and has medium-sized gap buffers as the leaves.
The gap buffers can handle a lot of patterns efficiently (like indexing/swapping for a sort, inserting/deleting a chunk of items anywhere, filtering, merging two lists), and because of that, you can probably make the leaves a little larger while still having decent practical performance in many cases. Larger leaves can also reduce the memory overhead, and the cost of common things like iteration. If you’re doing a bunch of totally random inserts and deletes you copy a lot of items around, but it’s “just” a large constant-factor slowdown (proportional to the leaf page size, not the total list size).
For the specific case of strings, someone did something spiritually similar with jumprope (also in Rust as jumprope-rs), with a skip list instead of a tree. For what it was designed for (applying edits quickly) it does perform quite well.
But this is a persistent data structure, so the nodes are immutable, i.e. copy-on-write. The benefit of a gap buffer is fewer bytes to move when you update it in place, but nothing’s ever updated in place here.
But this is a persistent data structure, so the nodes are immutable, i.e. copy-on-write. The benefit of a gap buffer is fewer bytes to move when you update it in place, but nothing’s ever updated in place here.
Yes, I didn’t mean that as an idea for Clojure’s vectors and should have spelled that out. Different use case that just comes to mind because you might use a loosely related data structure.
I’m saying, if you want a mutable vector but could use cheap inserts/deletes in the middle, a mutable RRB tree with weird leaves might be interesting. It’d take an implementation to know if there is any value there, including whether those gap buffer leaves are worth having in practice.
The Hash Array Mapped Trie uses a similar bit-indexed trie but has a further optimization, one that I guess the persistent vector doesn’t need because its interior nodes are mostly full.
In the HAMT the nodes get populated randomly (the trie index is a hash) so they’re often very sparse. This wastes a lot of room at high branching factors. So instead the node has a bitmap recording which child pointers are non-null, and the array only contains the non-null pointers, which makes it much shorter. Then there is a very clever use of popcount that lets it look up the right child with almost no overhead.
I found this talk on Clojure’s persistent data structures pretty illuminating as well. :)
One implementation of this for Rust is in the
im
crate.An idea I’d be interested to see done, but can’t imagine having the tuits for, is something kind of like it that’s mutable and has medium-sized gap buffers as the leaves.
The gap buffers can handle a lot of patterns efficiently (like indexing/swapping for a sort, inserting/deleting a chunk of items anywhere, filtering, merging two lists), and because of that, you can probably make the leaves a little larger while still having decent practical performance in many cases. Larger leaves can also reduce the memory overhead, and the cost of common things like iteration. If you’re doing a bunch of totally random inserts and deletes you copy a lot of items around, but it’s “just” a large constant-factor slowdown (proportional to the leaf page size, not the total list size).
For the specific case of strings, someone did something spiritually similar with jumprope (also in Rust as jumprope-rs), with a skip list instead of a tree. For what it was designed for (applying edits quickly) it does perform quite well.
But this is a persistent data structure, so the nodes are immutable, i.e. copy-on-write. The benefit of a gap buffer is fewer bytes to move when you update it in place, but nothing’s ever updated in place here.
Yes, I didn’t mean that as an idea for Clojure’s vectors and should have spelled that out. Different use case that just comes to mind because you might use a loosely related data structure.
I’m saying, if you want a mutable vector but could use cheap inserts/deletes in the middle, a mutable RRB tree with weird leaves might be interesting. It’d take an implementation to know if there is any value there, including whether those gap buffer leaves are worth having in practice.