1. 12

DRAMHiT is a new hash table designed to work at the speed of DRAM. Architecting for performance, we embrace the fact that modern machines are distributed systems—while the latency of communication between the cores is much lower than in a traditional network, it is still dominant for the hash table workload. We design DRAMHiT to apply a range of optimizations typical for a distributed system: asynchronous interface, fully-prefetched access, batching with out-of-order completion, and partitioned design with a low-overhead, scalable delegation scheme. DRAMHiT never touches unprefetched memory and minimizes the penalty of coherence requests and atomic instructions. These optimizations allow DRAMHiT to operate close to the speed of DRAM.

https://github.com/mars-research/DRAMHiT

    1. 4

      I think the real interesting idea of the paper is to batch/prefetch requests in order to avoid cache misses:

      To avoid memory misses on the critical path, we change the interface of the hash table to support asynchronous sub-mission of requests and out-of-order completion. This allows us to avoid wasting the CPU cycles on accesses to cache lines residing in memory or remote caches. In our design, the hash table never touches unprefetched memory. The application submits a batch of requests. The hash table computes memory addresses corresponding to the keys, and prefetches the memory locations involved in the operations, putting requests on the queue of the prefetch engine, but not touching those addresses. After enough elements are accumulated on the queue and enough time has passed for the prefetched cache-lines to reach the first-level caches of the CPU, the hash table processes the operations, potentially issuing more prefetches for keys that require additional memory accesses to resolve hash conflicts, i.e., reprobes. Pending reprobe requests are put back on the request queue.

      The partitioning method is interesting, but seems only worth it if you either have a very zipfian distribution or a lot of cores and insertions.

      One major limitation of this hash table is that it uses open addressing with tombstones for deletion. This can result in rather pessimistic lookup performance as the occupancy grows. There’s an interesting follow-up from different authors who apply the prefetching approach from DRAMHiT to the closed-addressing CLHT.