DRAMHiT is a new hash table designed to work at the speed of DRAM. Architecting for performance, we embrace the fact that modern machines are distributed systems—while the latency of communication between the cores is much lower than in a traditional network, it is still dominant for the hash table workload. We design DRAMHiT to apply a range of optimizations typical for a distributed system: asynchronous interface, fully-prefetched access, batching with out-of-order completion, and partitioned design with a low-overhead, scalable delegation scheme. DRAMHiT never touches unprefetched memory and minimizes the penalty of coherence requests and atomic instructions. These optimizations allow DRAMHiT to operate close to the speed of DRAM.
I think the real interesting idea of the paper is to batch/prefetch requests in order to avoid cache misses:
The partitioning method is interesting, but seems only worth it if you either have a very zipfian distribution or a lot of cores and insertions.
One major limitation of this hash table is that it uses open addressing with tombstones for deletion. This can result in rather pessimistic lookup performance as the occupancy grows. There’s an interesting follow-up from different authors who apply the prefetching approach from DRAMHiT to the closed-addressing CLHT.