1. 37

  2. [Comment removed by author]

    1. 2

      I’m a fan as well. My colleagues switch to PostgreSQL now, because SQLite has limitations with ALTER TABLE. The Play framework uses database evolutions written in SQL so that is easier with PostgreSQL.

    2. 5

      I really love Sqlite, and reading accounts like this is great. BUT, note this all reads with no inserts/updates/deletes. Sqlites achilles heel for being really useful?

      Additionally, though this test was focused on read performance, I should mention that SQLite has fantastic write performance as well. By default SQLite uses database-level locking (minimal concurrency), and there is an “out of the box” option to enable WAL mode to get fantastic read concurrency — as shown by this test.

      1. 9

        You’d be surprised. Serializing writes on hardware with 10ms latency is pretty disasterous, giving parallel write databases a huge advantage over SQLite on hard drives. But even consumer solid state drives are more like 30us write latency, over 300 times faster than a conventional hard drive.

        Combine with batching writes in transactions and WAL logging and you’ve got a pretty fast little database. Remember, loads of people loved MongoDB’s performance, even though it had a global exclusive write lock until 2013 or something.

        People really overestimate the cost of write locking. You need a surprising amount of concurrency to warrant a sophisticated parallel write datastructure. And if you don’t actually need it, the overhead of using a complex structure will probably slow your code down.

        1. 3

          Sounds like you might like the “COST” metric…. https://lobste.rs/s/dyo11t/scalability_at_what_cost

        2. 6

          Given that they run all of expensify.com on a single (replicated) Bedrock database, that would pass my “really useful” test, at least. :)

          1. 2

            The project page itself warns about that. When toying with ideas, I thought about a front-end that sort of acted as a load balancer and cache that basically could feed writes to SQLite at the pace it could take with excesses held in a cache of sorts. It would also serve those from its cache directly. Reads it could just pass onto SQLite.

            This may be what they do in the one or two DB’s Ive seen submitted that use SQLite as a backend. I didnt dig deep into them, though. I just know anything aiming for rugged database should consider it because the level of QA work that’s gone into SQLlite is so high most projects will never match it. That’s kind of building block I like.

            Now I’ll read the article to see how they do the reads.

            1. 1

              Devil’s advocate. If you are going to give up on easy durability guarantees, you could also try just disabling fsync and letting the kernel do the job you are describing.

              1. 1

                I’ve been trying to make posts shorter where possible. That’s twice in days someone’s mentioned something I deleted: original version mentioned strongly-consistent with a cluster. I deleted it thinking people would realize I wanted to keep the properties that made me select SQLite in first place. Perhaps, it’s worth being explicit there. I’ll note I’m brainstorming way out of my area of expertise: databases are black boxes I never developed myself.

                After this unforgettable paper, I’d be more likely to do extra operations or whatever for durability given I don’t like unpredictability or data corruption. It’s why I like SQLite to begin with. It does help that a clean-slate front-end would let me work around such problems with it more true for memory-based… depending on implementation. Again, I’m speculating out of my element a bit since I didn’t build databases. Does your line of thinking still have something that might apply or was it just for a non-durable, front end?

          2. 2

            “Make RANDOM() deterministic (for some reason it’s non-deterministic by default)”

            Perhaps they wrote this tongue-in-cheek but I would have thought that you’d generally want it to be non-deterministic.

            1. 2

              I guess if you’re feeding generated data to your database when doing unit/integration tests, you’ll probably want two runs of the same script to produce the exact same dataset.