SQLite is 35% faster reading and writing within a large file than the filesystem is at reading and writing small files.
Most filesystems I know are very, very slow at reading and writing small files and much, much faster at reading and writing within large files.
For example, for my iOS/macOS performance book, I measured the difference writing 1GB of data in files of different sizes, ranging from 100 files of 10MB to 100K files of 10K each.
Overall, the times span about an order of magnitude, and even the final step, from individual file sizes of 100KB each to 10KB each was a factor of 3-4 different.
You are technically correct, but you’re looking at this the wrong way. The point of the article is that storing small blobs within a single file is faster than storing them in individual files. And SQLite is a good way to store them in a single file.
It’s a bit more than that (though note that the small vs big thing can be very different between hard disks and SSDs and CoW vs conventional filesystems). A write to a complete file is not just a data operation, it is an update to both the file contents and the filesystem metadata, including any directories that now include a reference to that file. A write within a file is not comparable because it is just an update to the contents (if also a resize, it updates a smaller set of metadata. On a CoW filesystem or one that provides cryptographic integrity guarantees it may also update more metadata). SQLite also provides updates to metadata (in most cases, richer metadata than a filesystem). The filesystem typically provides concurrent updates (though with some quite exciting semantics), which SQLite doesn’t provide.
The individual-files approach is kind of like the common newbie SQLite mistake of making multiple inserts without an enclosing transaction, wherein you get a disk flush after each individual insert. I’m sure a file update is less expensive, but that’s because the filesystem is only making its own metadata fully durable, not your data. 🤬
As I learn more about filesystems and databases (I’m writing a small b-tree manager as a research project) I’m considering what it is that makes one different from the other. A filesystem is really just a specialized database on raw block storage. (Apple’s former HFS+ is literally built on b-trees.) It would be easy to build a toy filesystem atop a key-value database.
The feature set of a filesystem has been pretty much unchanged for decades; even small advances like Apple’s resource forks and Be’s indexing we’re abandoned, although we do have file metadata attributes on most filesystems now. We really need to rethink it — I love the ideas behind NewtonOS’s “soup”.
I don’t think I am the one looking at it the wrong way ¯\(ツ)/¯
You might notice that there are two parts of your interpretation of the article:
storing small blobs within a single file is faster than storing them in individual files
This is true, and fairly trivially so. Note that it has nothing to do with SQLite at all. Pretty much any mechanism will do this.
And SQLite is a good way to store them in a single file.
That may also be true, but it has nothing to do with the performance claim.
Neither of these statements supports the claim of the article that “SQLite is 35% faster than the Filesystem”, not even in combination.
In addition, I am pretty dubious about the explanation given (open and close system calls). To me, a much more likely cause is that filesystems will prefetch and coalesces reads and writes within a file, whereas they will not do so across files. So it is actually the filesystem that is giving the speed boost.
I wonder if part of this is that apparently Window’s anti-virus blocks on every write call. I’d be interested to see how different these benchmarks are with AV disabled.
I wonder if somebody gona implement sqlite as an optional storage backend for Git’s ref and commit-graph. It would be interesting to see a benchmark against the current WIP reftable format.
It would be very interesting if somebody try using SQLite for it. I think most of the ground works are being done with the reftable format introduction that would help the tests in git.git to stop accessing the refs files directly and use API instead.
I totally agree with you. The git storage format is only barely documented and not even standardized. This makes it really hard for alternative git implementations like got(1), git9(1) or OpenGIT to get going. An SQLite DB would offer standardization to at least some degree.
Yeah I have heard of Fossil but unfortunately its not that common as Git.
Fossil is not competing against Git though but goes hand in hand with it.
This is especially notable that there is no limit on the amount of imports and exports you can do between Fossil and Git.
So this raises the question: is it possible to implement SQLite-on-bare-metal as a filesystem driver? Would there be any performance benefit? I’m pretty sure there’s already a SQLite FUSE driver, but that still requires a file inside another filesystem to contain the DB.
You could use SQLite’s virtual-filesystem API to wire up a raw block-storage device as a back end for a database. It would be a bit faster than going through the filesystem, dunno how much, and of course it depends on the filesystem and the OS.
But by putting FUSE on top of that you’d just be reintroducing the exact same causes of slowdown that made the filesystem slower in the first place.
The reason SQLite is faster at reading blobs is that it doesn’t have to allocate heavyweight data structures like a file descriptor, nor go through extra system calls to open/close them.
Except when file systems support multi-threaded writes.
I’ve lost my love for SQLite when I’ve made my apps multi-threaded, and SQLite became the big fat bottleneck all threads were waiting for. And I’ve tried every speed hack from StackOverflow until I actually lost data.
It’s old-ish but still relevant. And the layout of the SQLite website doesn’t make it super easy to find some of the interesting content therein — I don’t think I saw this article until after I’d already written my own filesystem-based small-blob store, several years ago.
Not comparing like and like.
SQLite is 35% faster reading and writing within a large file than the filesystem is at reading and writing small files. Most filesystems I know are very, very slow at reading and writing small files and much, much faster at reading and writing within large files.
For example, for my iOS/macOS performance book, I measured the difference writing 1GB of data in files of different sizes, ranging from 100 files of 10MB to 100K files of 10K each.
Overall, the times span about an order of magnitude, and even the final step, from individual file sizes of 100KB each to 10KB each was a factor of 3-4 different.
You are technically correct, but you’re looking at this the wrong way. The point of the article is that storing small blobs within a single file is faster than storing them in individual files. And SQLite is a good way to store them in a single file.
It’s a bit more than that (though note that the small vs big thing can be very different between hard disks and SSDs and CoW vs conventional filesystems). A write to a complete file is not just a data operation, it is an update to both the file contents and the filesystem metadata, including any directories that now include a reference to that file. A write within a file is not comparable because it is just an update to the contents (if also a resize, it updates a smaller set of metadata. On a CoW filesystem or one that provides cryptographic integrity guarantees it may also update more metadata). SQLite also provides updates to metadata (in most cases, richer metadata than a filesystem). The filesystem typically provides concurrent updates (though with some quite exciting semantics), which SQLite doesn’t provide.
The individual-files approach is kind of like the common newbie SQLite mistake of making multiple inserts without an enclosing transaction, wherein you get a disk flush after each individual insert. I’m sure a file update is less expensive, but that’s because the filesystem is only making its own metadata fully durable, not your data. 🤬
As I learn more about filesystems and databases (I’m writing a small b-tree manager as a research project) I’m considering what it is that makes one different from the other. A filesystem is really just a specialized database on raw block storage. (Apple’s former HFS+ is literally built on b-trees.) It would be easy to build a toy filesystem atop a key-value database.
The feature set of a filesystem has been pretty much unchanged for decades; even small advances like Apple’s resource forks and Be’s indexing we’re abandoned, although we do have file metadata attributes on most filesystems now. We really need to rethink it — I love the ideas behind NewtonOS’s “soup”.
I don’t think I am the one looking at it the wrong way ¯\(ツ)/¯
You might notice that there are two parts of your interpretation of the article:
This is true, and fairly trivially so. Note that it has nothing to do with SQLite at all. Pretty much any mechanism will do this.
That may also be true, but it has nothing to do with the performance claim.
Neither of these statements supports the claim of the article that “SQLite is 35% faster than the Filesystem”, not even in combination.
In addition, I am pretty dubious about the explanation given (open and close system calls). To me, a much more likely cause is that filesystems will prefetch and coalesces reads and writes within a file, whereas they will not do so across files. So it is actually the filesystem that is giving the speed boost.
It’s not quite the SQLite bit, but wow Windows 10 get your shit together.
I wonder if part of this is that apparently Window’s anti-virus blocks on every write call. I’d be interested to see how different these benchmarks are with AV disabled.
I wonder if somebody gona implement sqlite as an optional storage backend for Git’s ref and commit-graph. It would be interesting to see a benchmark against the current WIP reftable format.
Somewhat related: SQLite itself uses the Fossil-SCM instead of Git which uses an SQLite DB to store its data.
Yeah I have heard of Fossil but unfortunately its not that common as Git.
Here is the context for what im suggesting for those who might be interested https://gitlab.com/gitlab-org/git/-/issues/6#note_501523892. Refs in Git are managed either in many files or concat into a single file plain text.
It would be very interesting if somebody try using SQLite for it. I think most of the ground works are being done with the reftable format introduction that would help the tests in git.git to stop accessing the refs files directly and use API instead.
I totally agree with you. The git storage format is only barely documented and not even standardized. This makes it really hard for alternative git implementations like got(1), git9(1) or OpenGIT to get going. An SQLite DB would offer standardization to at least some degree.
Fossil is not competing against Git though but goes hand in hand with it. This is especially notable that there is no limit on the amount of imports and exports you can do between Fossil and Git.
So this raises the question: is it possible to implement SQLite-on-bare-metal as a filesystem driver? Would there be any performance benefit? I’m pretty sure there’s already a SQLite FUSE driver, but that still requires a file inside another filesystem to contain the DB.
You could use SQLite’s virtual-filesystem API to wire up a raw block-storage device as a back end for a database. It would be a bit faster than going through the filesystem, dunno how much, and of course it depends on the filesystem and the OS.
But by putting FUSE on top of that you’d just be reintroducing the exact same causes of slowdown that made the filesystem slower in the first place.
The reason SQLite is faster at reading blobs is that it doesn’t have to allocate heavyweight data structures like a file descriptor, nor go through extra system calls to open/close them.
Except when file systems support multi-threaded writes.
I’ve lost my love for SQLite when I’ve made my apps multi-threaded, and SQLite became the big fat bottleneck all threads were waiting for. And I’ve tried every speed hack from StackOverflow until I actually lost data.
You generally want batching in that case.
Yeah, the flaw has become apparent when I’ve converted a CLI batch program to be a web service.
Why is an old page from the sqlite documentation being posted again?
Because people think that it’s interesting.
Do they though?
They up voted it…
It’s old-ish but still relevant. And the layout of the SQLite website doesn’t make it super easy to find some of the interesting content therein — I don’t think I saw this article until after I’d already written my own filesystem-based small-blob store, several years ago.