1. 86
  1.  

  2. 36

    The transactions page states that MongoDB is “the only database that fully combines the power of the document model and a distributed systems architecture with ACID guarantees,” a combination also claimed by CosmosDB, DynamoDB, FaunaDB, Oracle NoSQL, OrientDB, RavenDB, SAP HANA, YugaByte DB, et al.

    Or, how to make me laugh out loud with the driest, most academic prose possible.

    1. 3

      The only DBMS I have lost data with is Mongo.

    2. 14

      Another (over my head) but nice, thoughtful job!

      @aphyr, knowing that you spend a non-trivial amount of time doing your analysis, what prompted the un-compensated analysis?

      1. 46

        A few things came together! One is that people are always mentioning Jepsen and MongoDB together, and asking what it does now, and I keep fumbling the ball. Another is that the MongoDB Jepsen test suite is one that people frequently try to run, and when it fails, they file GitHub issues asking what’s wrong, so it’s been in the back of my head like “yeah, I need to go dig into that at some point.”

        About… maybe a month ago, Evan Weaver, from FaunaDB, sent me a link to the MongoDB Jepsen page which accidentally forgot to talk about default behavior. I was busy and forgot about it until Jepsen got tagged in a Twitter thread where a MongoDB developer advocate said “We are passing the Jepsen test suite and it was back in 2017 already. So, no, MongoDB is not losing anything if you know what you are doing.”, and linked to the page again! THAT was like oh, yeah, I REALLY gotta do this.

        https://twitter.com/MBeugnet/status/1253622755049734150

        I’d just finished a full rewrite of Jepsen’s generator system and I needed a project to use as a proving ground, to make sure it was actually usable before release. Dug into the MongoDB test suite code and realized I couldn’t get it to run either–it’d accrued a bunch of code for other environments, and I think somewhere along the way it stopped working with a standard Jepsen environment. I started a rewrite expecting I’d basically just confirm transactions were SI, but then… I found the defaults weren’t, and then even when I fixed the test suite to use the correct safety levels THOSE looked broken, and it just kinda snowballed from there.

      2. 13

        Sometimes, Programs That Use Transactions… Are Worse

        You ever see a reference that is such a deep cut you feel like it’s written for you, specifically?

        1. 9

          I am personally delighted every time someone catches these. :)

          1. 2

            Haha, amazing! Thank you for introducing me to this.

          2. 5

            I liked the analysis, but the part about ACID and Snapshot Isolation reads a bit off:

            Snapshot isolation is a reasonably strong consistency model, but claiming that snapshot isolation is “full ACID” is questionable.

            Marketing usage might be deceptive, but to me this reads like you’re saying that only Serializable transactions are ACID. But even Serializable transactions exhibits real-time anomalies. Then you’re left with strict serializability, which afaik only Spanner guarantees for distributed settings. Can Postgres still claim to have ACID transactions? Is Spanner then the only player with “full ACID” transactions?

            Although the “I” in ACID means Isolation, there are degrees. Both Read Committed and Strict Serializable transactions are, to me, “fully ACID”.

            1. 13

              I’d disagree! ACID isn’t… really a well-defined property, which is why I couch it carefully in this report, but in general ACID “isolation” is understood to have something to do with transactional interleaving and equivalence to a serial history. Realtime-only anomalies in serializability don’t violate isolation, because they don’t create (visible) interleavings of operations across transactions. In this sense, serializability is the weakest of several consistency models which provides ACID “I”.

              Consistency is a bit of a different beast, because there are application-level invariants which could be violated by serializability, but preserved under, say, strong session or strict-1SR. For those cases, yeah, you could argue serializable isn’t ACID either. That’s part of why I don’t like “ACID” as a descriptor, but people keep using it, so… here we are.

              1. 2

                Fair enough! It doesn’t help that the literature can call Snapshot Isolation (or any other) both an “Isolation level” and a “Consistency criterion”, although I see the first one less and less nowadays.

                That’s part of why I don’t like “ACID” as a descriptor

                It’s true that ACID transactions can mean different things to different people, which I guess that’s why it’s the perfect marketing material. With that… it’s on us to come up with a new hip term to sell to clients!

            2. 2

              This sounds like mongodb is a mess that works most of the time, right enough to sell. But don’t try to debug why exactly something happens. That reminds me about these online poll systems, sometimes used in online streams, where the result bounces back and forth in an order of magnitude too big to be reasonable. Maybe it’s a mongodb thing..

              1. 1

                Thank you @aphyr! I’ve been a really big fan of the Jepsen writeups! They’re what introduced me to linearizability and causal consistency, which has made me very skeptical of clustered database safety in general and for that I’m happy. Having been slammed with MongoDb’s aggressive sales pitches in the past, I’m snickering at their insane API defaults for transactions - what were they thinking??? It’s a pretty massive footgun they’ve got there and I’m amazed it got to production without someone going “Uh, transactions are usually expected to be safe - is this safe?”

                I get that there’s a use case for lossy systems but for my use cases, databases should not lose data ever. Guess that’s why I’m still using a single postgres server and plan to shard off of tenant ids - a single company will rarely exceed the capacity of a postgres server at the worst case for the business I’m in. Sometimes the new hotness is really just a source of new burns