1. 5
    1. 2

      This blog has a series of posts (“lab notes”) about Prolly Trees. Definitely worth reading if they interest you.

      1. 1

        Follow-up: I’m implementing a Prolly tree for the lulz (actually it’s the variant described in “Merklizing The Key-Value Store”.) The incremental-update algorithm is a bitch. As with B-trees the 90% case is easy, but I keep finding more and more edge cases, each of which complicates the code. It’s actually worse than B-trees because the merges don’t just occur with immediate neighbors but can cascade sideways. I get the feeling I need to look at the problem from a different angle from which it will be simpler.

        1. 1

          Are you using a fixed boundary probability or adjusting it based on the size of the leaves?

          1. 1

            Fixed. This variant doesn’t use a rolling hash, it just starts a new chunk at any leaf node whose digest is less than the cutoff.

            I’m honking of switching to the more common rolling-hash form because then I can get more uniform chunk sizes. Plus with the current algorithm even updating a key’s value can cause rebalancing, which is bad for performance.

            1. 1

              I think you are using the fixed probability I described. It should not exhibit bad behaviour with cascading merges. dolthub describe a variant where they use a CDF to encourage boundaries near a particular size, but I suspect their form is very vulnerable to the cascading merges.

              1. 1

                Yesterday I made the final tweak that caused the algorithm to work correctly, and once I understood it I was able to dismantle pretty much all the ugly code and ended up with something fairly elegant.