This looks very similar to the motivation for the Hopscotch hash table, which aims to keep all of the secondary locations in a single cache line. With an out-of-order CPU, if everything is in cache then you’re basically able to do all of the secondary probes in parallel. I can imagine writing the hopscotch lookup code using SIMD to combine both benefits.
Tried to implement hopscotch myself, did quite a lot of benchmarking using real world data, and simple open addressing was better in my testas. By any chance do you know of a good implementation ?
The tsl one is pretty good, but their implementation of a Cuckoo hash table seems to outperform it for most cases. The hopscotch table’s biggest advantage is space saving: it performs well up to around 90% occupancy, whereas I think that number is closer to 70% for Cuckoo.