1. 41
  1.  

  2. 41

    Building a database is often one of the most humbling and educational experiences you can have as an engineer. What can feel like a weekend hack can easily turn into a 30-year obsession. It’s like playing diablo II on hardcore mode where any bug could kill your user’s businesses forever. You know you’re doing something right when Howard Chu starts publicly talking about how terrible your db is :) I strongly recommend skimming the notes, papers, slides, and video recordings of Andy Pavlo’s DB course if you go down this route, as he discusses a wide range of design choices and real-world examples of them, as well as perspectives on why some techniques may or may not have been explored or chosen recently. It’s a goldmine that I make sure to follow along to every year as a way of staying up-to-date.

    A note on testing: stateful systems have significantly higher risks associated with bugs. Data loss is forever. Any time you open a file, you are essentially writing lock-free code with every process that may have opened the file in the past, and may have crashed at any time.

    One of the simplest techniques that has caused the most bugs to jump out in my systems is to insert some conditionally compiled logic before every operation that either writes something to disk or a socket or performs an operation whose effects may be witnessed by another thread. In this conditional “fail point”, allow the system to randomly add delays, return an IO error from the surrounding function (if the following code actually performs IO), or crash the process while running a task that runs a concurrent workload where threads are iterating forward through all the data, iterating backwards through all of it, inserting and deleting random keys, and constantly asserting that an entire set of non-deletable keys are present in all iterations. It’s really easy to mess up index integrity if you crash at some random unexpected point, or you let threads operate on shared structures in some unanticipated way that was teased out by adding some random sleeps. This has caught a huge number of absolutely critical data loss bugs that probably would have never shown up in any unit or integration test. Those tests are no longer an acceptable stopping point. Now you need fuzzers, property tests, crash tests, concurrency tests, and fault injection before you can begin to feel safe cutting a release candidate that may still cause data loss :)

    If you use any networking at all, build the db on top of a simulator if you want to find the race conditions you will certainly create as a human before they destroy somebody’s business.

    1. 12

      If you use any networking at all, build the db on top of a simulator if you want to find the race conditions you will certainly create as a human before they destroy somebody’s business

      You should submit that as a full article. It’s great.

      1. 3

        This. Even building a database on top of a lower-level database (as I do) is quite difficult. I would never want to write one from the bare metal, because I have other things I’d like to create in this lifetime. I had that urge a few years ago, but sublimated it by messing around with LMDB and libMDBX, which are low-level key-value stores on which you can build fancier features like indexes, structured records and query languages.

        1. 2

          Much like how PouchDB has different adapters and can work on top of IndexedDB and LevelDB, I was considering having a disk storage, IndexedDB storage and an SQLite storage, and check the correctness of the first two with the third.

          At least I would have something solid as a reference.

        2. 1

          I agree with you that Andy Pavlo’s DB course is great. I’m 70% finished and I couldn’t find anything nearly as rich as that.

          I had fuzzers, property tests and concurrency tests on my radar, but I’m yet to start fiddling with IO bugs, crashes while writing an index, etc.

          I agree that stateful systems are harder, and I wonder how much I can avoid by mimicking Datomic’s “durable persistent” data structures even for indexes.

          Thanks for the simulator link :)

        3. 11

          I can’t reconcile “embedded” with “immutable”. Whatever your definition of “embedded” is, storage space is limited. (Yes, current phones and tablets come with a lot of storage, but users always fill it up with photos or gigantic games.) If your database grows constantly, then it’s only a matter of time before it exhausts free storage. Then what? Your actually-embedded box crashes and can’t be fixed without a factory reset. Or your mobile app frustrates users, who either spam your support channel with complaints that it’s using too much space, or simply delete the app.

          I’m also feeling cranky because when people bring up client-side databases they never mention Couchbase Lite (of which I’m the architect). We do a thriving enterprise business but seem to be unknown in the OSS world despite being Apache-licensed. Couchbase Lite meets a lot of this guy’s needs, aside from running in a browser. (IMHO in-browser DBs are of limited use because the browser doesn’t provide much persistent storage and likes to nuke it whenever it feels like it.)

          1. 3

            I think of immutability in this context as “immutable until compacted and GCed” so tombstones and partial updates get folded periodically into a new snapshot. (Like building a tarball from a git tag.)

            As for Couch vs. something new: there’s space for many models and engines here. Not everything is JSON-over-HTTP, and that’s okay. (Getting reliable storage in a browser is indeed hard, but compacted snapshots can help with recovering from a storage/cache miss too.)

            1. 3

              Well, I took the author at their word: “Being immutable means that only new information is added, no in-place update ever happens, and nothing is ever deleted.”

              What you’re describing is what I’d call append-only or log-based. That’s how CouchDB’s and Couchbase Server’s databases work. It has the downside of lots of write amplification, since all the live data gets rewritten every time you compact.

              On devices with limited storage, append-only has the worse problem that you can’t free up any storage unless you have enough free space to copy all the live data first. Which basically means it’s dangerous to fill up more than half of your storage, unless you carefully keep track of how much live data you have. On a device like a phone where you don’t control all the storage, you can find yourself, through no fault if your own, unable to compact because some other apps used up too much space!

              1. 1

                In fact, git itself implements this internally with “packfiles”, which is an internal binary format to represent diffs efficiently regarding storage. However every bit of data is still there, the “compaction” is transparent to the user.

                Having snapshots is an possibility, but this would be better if left to the application to do, if the database could expose the snapshot and build more data on top of it, and let the application delete old data when convenient.

              2. 3

                Storage efficiency would be the most challenging place to optimize for, and I agree that storage space is limited.

                But I also think that this is actually viable. A quick find ~/ -type d -name '.git' | wc -l returns me 260 entries, and all of those are instances where things grow “forever”, and that’s fine. My laptop has more storage than my phone, but only ~8 times more. So a simple translation would mean that I could have ~30 databases and be also fine.

                Sure, this doesn’t work for all types of applications. That’s what tools like Git Annex and Git LFS are trying to solve, and there are many instances that growing “forever” isn’t a good idea. However I do think that there are many other places that growing “forever” is a good idea, and that’s where I’m focusing at.

                Sorry about not mentioning Couchbase Lite. I do know it and I have actually used it in the past on a personal project, actually.

                1. 3

                  I think the reason nobody mentions Couchbase Lite is because they probably assumed (as I did, before Googling) that it’s just a renamed CouchDB. In my experience, many people in OSS companies tried CouchDB at some point in the past and had bad experiences. Thus, anything with “Couch” in the name is going to be limited by those preconceptions from the start.

                  1. 2

                    Yeah, we [Couchbase] have been fighting that misconception since 2012. The company started as a merger of two tiny startups, one around memcached and the other around CouchDB (including creator Damien Katz), but found that using CouchDB to persist memcached’s data didn’t work because CouchDB was too slow to keep up. Fixing this involved tearing apart CouchDB and replacing big parts of it, like the b-tree storage engine, with optimized C code, as well as removing functionality like replication. So by the time the first “real” version of Couchbase Server shipped, it was far enough diverged from CouchDB to be a different species entirely. Today we do still have Erlang in the code base but I don’t think any original CouchDB code remains — even query is entirely different, using a SQL-derived query language called N1QL.

                  2. 3

                    Whatever your definition of “embedded” is, storage space is limited.

                    That’s interesting, because that’s not what my interpretation of the word “embedded” means in an “embedded database.” My understanding is that an embedded database is one that is tightly coupled with an application, or rather, where the database itself is hidden from the end user. Compare that with something like PostgreSQL for example, where the end user typically has to maintain a running instance of PostgreSQL in order for the application to work. Contrast that with SQLite, and the end user might never know that SQLite is used at all.

                    It may be common association that an “embedded” database is typically used in environments with constrained storage space, but it is also common to use embedded databases outside said environments. So I don’t think “embedded” in this context actually has anything to do with available storage. (SQLite, for example, just increased its maximum database size to 281 TB because one of its customers was getting close to the previous 140 TB maximum.)

                    Also, with respect to immutability, I suppose it’s hard to know whether they mean truly immutable in that there is never any kind of garbage collection, or whether they’re using “immutable” in the data structure sense. It’s commonly understood, for example, that an “immutable data structure” will have some kind of garbage collection process that actually allows unused memory to be reused.

                    1. 1

                      My understanding is that an embedded database is one that is tightly coupled with an application

                      Crap, you’re right. And I know this, I’ve used “embedded” to describe the difference between SQLite and MySQL before. My brain just went the wrong direction yesterday … I blame the fact I’ve been trying to get a Seeed microcontroller to talk to MIDI so I’ve got embedded devices on the brain. 😣

                      1. 1

                        Haha no worries! I’m writing a kind of database myself and just recently looked up what precisely qualified as an “embedded” database, so it stuck out to me here. :-)

                  3. 4

                    Doesn’t https://irmin.org/ fit the bill?

                    Embedding might be complicated outside the OCaml-verse - I haven’t tried - but as Irmin can even be compiled to a Unikernel, I guess it should be pretty portable.

                    1. 1

                      I just received the link for irmin somewhere else, but I haven’t looked at it yet. At a first glance, it looks like a great place to get inspirations from.

                      Even though I could compile it, say, for iOS, it would be a tough sell to an iOS programmer to actually add any dependency that’s no either Swift, Objective-C or C. Bringing the Ocaml runtime with the executable binary would be a red flag.

                      Another way to say “embedded” could be “cross plaftform”, as in POSIX, Windows, iOS, WASM, etc.

                      1. 1

                        If it’s able to compile, I guess one could possibly interface c (and the others) by using OCamls FFI, not sure how it would work with the constraints of iOS. You should theoretically be able to link it statically, but I have never tried that, and don’t know how that works for Irmin specifically.

                    2. 2

                      I’m currently on a very similar endeavor and we had a great discussion here on lobste.rs about that a few months ago with many good hints and prior work. I sent a somewhat longer email to the author’s public inbox with my current perpective on things in case anyone is interested.

                      1. 2

                        Do we know why he wants/needs this?

                        In my experience this looks like a functionality “wish list”, rather than something that anyone needs.

                        1. 2

                          What I mentioned on the article on my need was:

                          I want a database that allows me to create decentralized client-side applications that can sync data.

                          I’m not trying to solve anyone’s problem but my own, and more than a wish list it’s a hand picked collection of trade offs, like on how to deal with complexities of conflicts from uncoordinated writes.

                          1. 2

                            The sync protocol, and conflict handling, are IMHO a lot more interesting problems than the implementation of the local DB. Take a look at Dat and Scuttlebutt … neither is perfect, but they have interesting designs. (And they make good use of append-only data structures at the sharing/protocol level.)

                            For a lower level approach to syncing, the CouchDB replication protocol, at a very high level, is a decent design. We still use the same architecture in Couchbase Lite, although the details of the protocol are completely different because sending zillions of HTTP requests is too expensive.

                            1. 1

                              This is a flyby Comment so I apologize if this isn’t what you mean but that sounds a bit like https://realm.io/

                              1. 1

                                Thanks for confirming, but like snej says below this is a different set of problems, many of them unrelated to a database.

                                And not that cannot be done, but I seriously doubt anyone will work on it again in the short/mid term. This kind of problem was nicely solved by Lotus Notes and even if today the solution looks old, they created their own industry in distributed data/apps before internet become commonplace. Obviously it will not cover all the items in your checklist, but I don’t think they’re really needed (like immutability, that is only needed for some specific regulatory needs, or SQL compatibility).

                                Anyway, check Lotus Notes architecture for pointers.

                              2. 1

                                Nothing wrong with that though, is there?

                              3. 1

                                I want this database too, though not badly enough to write it myself.

                                Since people are asking “why would you want this?” - I want to have (and to write) applications that store data locally - because as a user and as a developer I don’t want to have to set up and maintain a persistent server, and it would be nice to have control over my own data - but which can be used from multiple devices. Right now, the solutions for that suck.

                                • You could write a file and sync it with Dropbox, but then you’re stuck writing your own reconciliation layer (or worse, forcing that on the user) when they get out of sync, and you’re exposed to all of the jankiness of file systems. Don’t forget an fsync!
                                • You could use a sqlite database local to each device, but then you’re stuck writing your own reconciliation layer and your own syncing layer.
                                • I guess you could try using Sqlite inside Dropbox, but that seems… unsupported and risky. Who knows whether Dropbox’s syncing might mangle your database or something.

                                These options suck even if you’re only concerned about full Windows/Linux/Mac PCs, but if you want to have an Android or iPhone app too you’re really stuck with - at best - a self-hosted server that you have to maintain.

                                1. 1

                                  Matrix is not relational, but it’s a syncable and immutable database. And there’s a proof of concept of it running on WASM.

                                  1. 1

                                    For a decentralised git-like database, there are exiting projects: dolt and noms being a couple I’m aware of.