Here’s the companion talk by Pat Morin: https://www.youtube.com/watch?v=-1qRcDnbihA
The Eytzinger layout and search has got to be the coolest data structure and most elegant algorithm I’ve learned in a long time. Plus, such a great “by-the-way” dive into all the essential elements of writing high performance code, and how mechanical sympathy trumps complexity analysis, in a single case study that has application to so many computer science problems.
Laying out trees within a static array is a useful trick in general. Pity that the only widely used data structure that does this is the binary heap.
I’ve experimented a little with the Eytzinger layout … problem is, the data I’m searching isn’t integers, it’s strings, so each array item is really a pointer to a string. I’m sure the string comparisons take so long they outweigh the differences in layout discussed here.
Could you inline most strings that fit in e.g. 63 bits assuming you’re searching things like words in a dictionary? That way you can use the high bit to indicate a pointer jump to strings that don’t fit?
This would be the hash table equivalent of open-addressing with a little bit of chaining for the bigger stuff.
I might try something like that. But it turns out the biggest performance constraint of all in my code is the space the table takes up, because it lives in a page in a b-tree, and the more keys I can fit on a page, the less disk I/O there is.
I keep trying to make my code fast, but when I look at profiles, they’re all dominated by pread and pwrite. So I really shouldn’t be reading papers like this one!