This looks neat, but RPC is not the way to efficiently share a database between multiple processes on a single machine; shared memory is. IPC (e.g. Unix domain sockets) can be used to implement the “control plane” (e.g. database connections), but the “data plane” (i.e. actual data access) should be shared memory.
It’s not mentioned anywhere but the FAQ, but this appears to be the successor to Kyoto Cabinet, from the same author.
I’m reading through the data format descriptions; it’s kind of weird.
The hash DB has a fixed-size table, so once the data set outgrows that, performance will regress to O(n). Apparently there’s an operation to rebuild the db with a bigger table; IMHO that should be taken into account in the benchmarks.
The b-tree is layered on the hash table. Nodes are values, with 48-bit integer keys. This does solve the problem of fixing up node pointers when the node is relocated, but it adds a hash lookup into every node indirection.
It’s not clear how durable this is. The docs say there is an operation to repair the db after a crash; does this have to be done manually? And how durable is it to power failures or kernel panics?
I’d also like to see benchmarks with bigger data than just 8-byte keys and values. I test my db engine with a real-world JSON data set where the keys are 10-20 bytes and the values up to 32KB.