FYI the append-only b-tree described near the end is the file format used by CouchDB and Couchbase Server. It’s very clever, but it does produce quite a lot of write amplification in the long run, because you have to periodically compact the file by copying all the live pages to a new file.
LMDB and libMDBX, while still using copy-on-write, avoid this by writing the modified pages to free space in the file when possible, then marking the old pages as free as soon as no one’s using the old tree they were part of.
The general idea this gives me is that append-only is a small-scale thing. The hardware limits driving it — cache lines, flash storage blocks — are at the kilobyte or perhaps few-megabytes scale. Above that size, I suspect it doesn’t matter if access patterns are linear.
(See also “cache-oblivious” data structures, which use recursive scaling to work well no matter what the scale of caching is.)
FYI the append-only b-tree described near the end is the file format used by CouchDB and Couchbase Server. It’s very clever, but it does produce quite a lot of write amplification in the long run, because you have to periodically compact the file by copying all the live pages to a new file.
LMDB and libMDBX, while still using copy-on-write, avoid this by writing the modified pages to free space in the file when possible, then marking the old pages as free as soon as no one’s using the old tree they were part of.
The general idea this gives me is that append-only is a small-scale thing. The hardware limits driving it — cache lines, flash storage blocks — are at the kilobyte or perhaps few-megabytes scale. Above that size, I suspect it doesn’t matter if access patterns are linear.
(See also “cache-oblivious” data structures, which use recursive scaling to work well no matter what the scale of caching is.)