1. 32
  1.  

  2. 8

    If you read the code, it’s doing some rather funky things – multiple directory traversals to read the data, copying into stdio buffers before copying into the final buffer, etc.

    I was curious when I saw this, and tried a little bit. I got a roughly 20% performance bump using FreeBSD + ZFS by just switching to fstat + fdopen instead of repeatedly walking to the file:

    https://eigenstate.org/paste/d41e3b69d258e3b52f7cee7a9921f027

    I’d still expect the directory full of files to be somewhat slower, since there should be more system calls and seeks involved, but I’m not convinced that this is a great benchmark of exactly how much overhead is inherent there.

    1. 7

      Either way, this article disproves the common assumption that a relational database must be slower than direct filesystem I/O.

      Wait. This is a common assumption? I’ve always heard the opposite. The directory full of files as database is a well known antipattern.

      1. 5

        Well, the assumption is often made in the context of storing things images or files inside a database. Aka, you store transactionally important data in the database, but use the filesystem for large BLOBs (images, large uploads, etc). That’s the assumption I’ve had in the back of my head, but this article has given me reason to consider otherwise.

        1. 1

          I always thought that a filesystem is a type of database.

          1. [Comment removed by author]

            1. 1

              Because the filesystem has a particular metadata pattern and checkpoints at certain times with a particular physical layout, which may not work well for your data.

          2. 2

            The graphs are fairly confusing because they compare, for each system, the performance ratio of SQLite versus the Filesystem, but they look like they compare the performance across filesystems. In particular, looking at the graphs, it looks like Windows 7 and 10 are much faster at reading files than all other systems, while in fact the result seem to be explained by the fact that (when antivirus protection is enabled as in these tests) the filesystem access overhead is sensibly larger.

            The point of the author is to talk about the time ratio between filesystem and database, so it is of course reasonable to emphasize that. But I still think that the very easy confusion suggests that this is not the best visualization approach. If the timescale of access on the various system are comparable enough, I think it would be best to have absolute time charts, plotting one bar for the filesystem and one bar for the database, on each system. Otherwise, no graphic plot need to be given: a numeric representation with one ratio per column would convey the same information and be less confusing.

            The key problem is that bar graphs like that are designed to make it easy to visually compare performance of the various measurements, which is non-sensical here (it is not interesting to know that the ratio between filesystem and database is 1.5X worse on Apple systems than on Ubuntu systems; you want to know that the ratio is surprisingly large on all systems). This is a case of a tool used for the wrong purpose.

            1. 1

              I agree that it is somewhat misleading.

              However, don’t you think that the difference in ratio across systems is indicative of OS/filesystem efficiency? Assuming that SQLite and direct read/write performance is optimal in all systems and that the different file systems are comparable in terms of features.

            2. 1

              For which file system? Also, a database can do lots of clever buffering/caching since it has very specific data, whereas fread/fwrite have to handle the general case, so it’s not entirely a fair comparison IMO.

              1. 1

                Example? File systems and OS kernels do disk caching in ram too.

              2. 1

                It wasn’t clear which FS they used on Ubuntu. I imagine it was the default (ext3?) but it would be nice to know for sure!

                1. 3

                  Default FS on ubuntu has been ext4 for awhile. Next-gen people are planning to move to is btrfs whenever it stops losing peoples’ data.

                  1. 1

                    At least under btrfs data is all you use. It’s a lot worse under reiserfs.

                    1. 1

                      AFAIK openSUSE and SLES, uses btrfs for root volumes (they use snapshots) and XFS for data volumes.