This needs to happen. If it does they should work this bit of character building in about Elle:
Like a clever lawyer, Elle looks for a sequence of events in a story which couldn’t possibly have happened in that order, and uses that inference to prove the story can’t be consistent.
As database builders and users, we’ve made talking about systems a lot harder on ourselves by conflating the ideas of replication, active-active, atomic commitment, and concurrency control.
Replication is a technique used to achieve higher availability and durability than a single node can offer, by making multiple copies of the data. Techniques include Paxos, Raft, chain replication, quorum protocols, etc.
Active-active means that transactions can run against multiple different replicas at the same time, while still achieving the desired level of isolation and consistency.
Atomic commitment is a technique used in sharded/partitioned databases (which themselves exist to scale throughput or size beyond the capabilities of a single machine) to allow transactions to be atomically (“all or nothing”) committed across multiple shards (and allow one or more shards to vote “nah, let’s not commit this”). 2 phase commit (2PC) is the classic technique.
Concurrency control is a set of techniques to implement isolation, which is needed in any database that allows concurrent sessions (single node or multi-node). Classic techniques include 2PL and OCC, but many exist.
When vendors or projects answer concurrency control questions with replication answers (which appears to be the case here), it’s worth diving deeper into those answers. There are cases where “Paxos” or “Raft” might be answers to atomic commitment or even concurrency control questions, but at best they are very partial answers and building blocks of a larger protocol. Databases that only support “single shot”/predeclared transactions can get away without a lot of concurrency control, for example, and might be able to do the required work as part of their state machine replication protocol.
To be extra clear, I’m not criticizing Aphyr here (the article clearly doesn’t conflate these concepts), but more pointing out what I think lies at the bottom of a lot of the issues we see with distributed database claims.
I strongly disagree with your assessment, based on the intro and conclusion of Aphyr’s document.
I don’t think it’s unreasonable to expect distributed database vendors to understand distributed database terminology and use it correctly. Moreover, the issues Aphyr found have little to do with replication per-se and more to do with claims of ACID transaction support that are unsupported by testing.
I feel like your apparently quite deep understanding of concurrency control here might be blinding you to the obvious problem that you can’t claim ACID compliant transactions and lose updates or permit fractured reads, and this system puts that claim in the <title> on their homepage.
I don’t think it’s unreasonable to expect distributed database vendors to understand distributed database terminology and use it correctly.
Agreed! My point is that vendors talking about implementation details (Raft!) is something of a negative sign, because those details (while important) are only a small part of the picture of implementing database semantics correctly. Instead, I prefer to see vendors talk about client-observable promises (e.g. “interactive transactions have repeatable read isolation, with linearizability”) rather than implementation details.
the obvious problem that you can’t claim ACID compliant transactions and lose updates or permit fractured reads
The “ACID means strict serializability” ship has, unfortunately, sailed. Or sunk to the bottom. Instead, I think the most reasonable thing to hope for is “ACID, and here’s what we mean by Isolation”. Then talk about that isolation either using a standard term (e.g. “snapshot”) or describe it crisply in terms of the anomalies that developers need to worry about.
Totally agree that “isolation” can mean different things in different contexts, or for different implementations. But ACID is a term of art that establishes a context in which “isolation” is pretty precisely defined. I don’t think a system can claim ACID compliance with their own bespoke definition of “isolation”.
The “ACID means strict serializability” ship has, unfortunately, sailed.
Maybe this is the disconnect.
Folks like Aphyr definitely (and correctly) believe that ACID is well-defined, and requires strict serializability, or something reasonably close to strict serializability. If other folks are trying to use ACID to mean something weaker than what it actually means, well, that results in articles like this one :) And, of course, much sadness and Sturm und Drang in production.
I think it’s reasonable for terms like ACID to evolve.
For example, the “D” originally meant a definition of durability that nearly nobody would be OK with today (based on Gray’s “A Transaction Model” from 1980): something like “on durable storage on one machine”. Nearly every production workload runs read replicas (at least!), and most productions systems support features like PITR. A vendor saying “eh, we lost writes because one machine failed” would stick out like a sore thumb in today’s market.
Same with serializability. It’s easy to say that weaker things are bad, but that’s not nearly the entire picture. For example, strict serializability requires coordination across readers and writers that weaker levels (like snapshot) don’t. That avoids some anomalies (write skew, in particular), but requires that developers are much more careful about how much their transactions read if they want decent scalability. I tend to believe that snapshot is probably the sweet spot here (write skew is relatively easy to reason about), and weaker levels don’t win as much on performance as they lose on correctness, but it’s a much more subtle debate than you’re implying.
I have been reading the Jepsen analyses since the beginning, and the overwhelming conclusion I have gained from them is: database vendors will say all sorts of things in documentation that are not true about the properties of the actual software.
If you want to make the case that we should be using a loose, informal definition of ACID well… I think you’re wrong, but you should at least be upfront about it. A database vendor is asking you to entrust them with your data. Making promises you can’t keep, even by implication, is wrong. I won’t trust a system from people who do that. I won’t use it.
I suggest that the implication of your question is the wrong framing, that it doesn’t matter what vendors claim because they are all lying. But if you read even a few of the other analyses on jepsen.io, you can see the difference between honest mistakes and subtle bugs and outright lying and faliure of due diligence. Just looking at a couple of these analyses we see, about CockroachDB:
In the latest betas, CockroachDB now passes all the Jepsen tests we’ve discussed above: sets, monotonic, g2, bank, register, and so on—under node crashes, node pauses, partitions, and clock offset up to 250 milliseconds, as well as random interleavings of those failure modes. The one exception is the comments test, which verifies a property (strict serializability) which CockroachDB is not intended to provide. Cockroach Labs has merged many of these test cases back into their internal test suite, and has also integrated Jepsen into their nightly tests to validate their work.
DataStax has adapted some of these Jepsen tests for use in their internal testing process, and, like Basho, may use Jepsen directly to help test future releases. I’m optimistic that they’ll notify users that the transactional features are unsafe in the current release, and clarify their documentation and marketing. Again, there’s nothing technically wrong with many of the behaviors I’ve discussed above–they’re simply subtle, and deserve clear exposition so that users can interpret them correctly.
The bad news: MySQL’s “Repeatable Read” does not satisfy PL-2.99 Repeatable Read: it exhibits G2-item anomalies including write skew. It does not satisfy Snapshot Isolation: it exhibits G-single, including read skew and lost update. Lost update rules out cursor stability. Reads in MySQL “Repeatable Read” are not repeatable, even under the ambiguous definitions of the ANSI SQL standard. Its transactions violate internal consistency, which rules out Read Atomic, Causal, Consistent View, Prefix, and Parallel snapshot isolation. Kleppmann’s 2014 Hermitage suggested MySQL Repeatable Read might be Monotonic Atomic View, but this cannot be true: we found monotonicity violations.
It’s quite easy, if you peruse these analyses, to separate databases that take their claims seriously from those that do not. So rather than reducing it to a simple binary of “do they meet their claims or not” it’s better to approach it with the attitude of, how do the vendors respond to the analysis? Good vendors take the analysis and run with it, at least patching obvious bugs, hopefully integrating Jepsen tests, updating documentation to reflect the subtleties. Bad vendors truck on without making meaningful changes, or even ignoring the results.
I believe Postgres does what it says. Aphyr’s tests indicate that replication has interesting edge cases. I believe they are well-documented. But it is mostly a classic RDBMS that puts consistency over availability. As for modern hyperscale AP systems like RavenDB, I don’t use one so I can’t recommend one.
A vendor saying “eh, we lost writes because one machine failed” would stick out like a sore thumb in today’s market.
And yet apparently RavenDB could say “eh, we lost writes but actually no machines failed,” and we have a conversation about whether the term ACID itself is the problem?
Same with serializability. It’s easy to say that weaker things are bad, but that’s not nearly the entire picture. For example, strict serializability requires coordination across readers and writers that weaker levels (like snapshot) don’t. That avoids some anomalies (write skew, in particular), but requires that developers are much more careful about how much their transactions read if they want decent scalability.
We totally agree on this stuff! Weaker semantics doesn’t mean the system is worse. (In fact, I think it usually means the system is better!)
The issue is just that these weaker systems don’t get to claim that they’re ACID-compliant. That’s all.
Atomic commitment is a technique used in sharded/partitioned databases (which themselves exist to scale throughput or size beyond the capabilities of a single machine) to allow transactions to be atomically (“all or nothing”) committed across multiple shards (and allow one or more shards to vote “nah, let’s not commit this”). 2 phase commit (2PC) is the classic technique.
Atomic commitment seems to describe a property of a transaction, not an implementation, doesn’t it?
I’d say “atomicity” is a property of a transaction, and “atomic commitment” is a protocol-level problem that must be solved to achieve atomicity in sharded databases (which does not need to be solved in replicated single-primary databases, or single machine databases, to achieve atomicity).
atomic commitment” is a protocol-level problem that must be solved to achieve atomicity in sharded databases … which does not need to be solved in … single machine databases
When you say “single machine” do you mean “single threaded”? If not, can you say a bit more about why a single-machine and multi-threaded (i.e. concurrent) database doesn’t need to solve “atomic commitment”?
If not, can you say a bit more about why a single-machine and multi-threaded (i.e. concurrent) database doesn’t need to solve “atomic commitment”?
I’m not the original commenter obviously, but I read the comment as being about a problem to be solved at the protocol level between multiple writers that can experience network partitions. Replicated single-primary databases don’t have this problem because only one node can write and the readers will trivially always be, at minimum, eventually consistent in the event of a network partition. A multi-threaded database doesn’t have this protocol problem because threads on a single node cannot experience a network partition barring some bizarro supercomputer hardware architecture or something. Once you shard the database, you’ve introduced multiple writers and that creates this problem.
I read the comment as being about a problem to be solved at the protocol level between multiple writers that can experience network partitions.
Yep, but I think this is the issue. I don’t think “atomic commitment” only applies to distributed systems subject to network partitions. I think it applies to any system that has concurrent readers and writers. Which includes single-node databases that are multi-threaded!
I enjoy when these analyses are written with the restrained and decorous loathing of an angry federal judge.
“the restrained and decorous loathing of an angry federal judge” is a fantastic turn of phrase, bravo!
I also love that Kyle’s job is basically to say “well, actually…” in the face of marketing that actually deserves the scrutiny.
I think someone should make a Phoenix Wright version of Jepsen, like what was done for the Debian systemd acrimony.
This needs to happen. If it does they should work this bit of character building in about Elle:
Kyle really nails a particular tone I don’t think anyone else can do.
As database builders and users, we’ve made talking about systems a lot harder on ourselves by conflating the ideas of replication, active-active, atomic commitment, and concurrency control.
When vendors or projects answer concurrency control questions with replication answers (which appears to be the case here), it’s worth diving deeper into those answers. There are cases where “Paxos” or “Raft” might be answers to atomic commitment or even concurrency control questions, but at best they are very partial answers and building blocks of a larger protocol. Databases that only support “single shot”/predeclared transactions can get away without a lot of concurrency control, for example, and might be able to do the required work as part of their state machine replication protocol.
To be extra clear, I’m not criticizing Aphyr here (the article clearly doesn’t conflate these concepts), but more pointing out what I think lies at the bottom of a lot of the issues we see with distributed database claims.
I strongly disagree with your assessment, based on the intro and conclusion of Aphyr’s document.
I don’t think it’s unreasonable to expect distributed database vendors to understand distributed database terminology and use it correctly. Moreover, the issues Aphyr found have little to do with replication per-se and more to do with claims of ACID transaction support that are unsupported by testing.
I feel like your apparently quite deep understanding of concurrency control here might be blinding you to the obvious problem that you can’t claim ACID compliant transactions and lose updates or permit fractured reads, and this system puts that claim in the
<title>on their homepage.Agreed! My point is that vendors talking about implementation details (Raft!) is something of a negative sign, because those details (while important) are only a small part of the picture of implementing database semantics correctly. Instead, I prefer to see vendors talk about client-observable promises (e.g. “interactive transactions have repeatable read isolation, with linearizability”) rather than implementation details.
The “ACID means strict serializability” ship has, unfortunately, sailed. Or sunk to the bottom. Instead, I think the most reasonable thing to hope for is “ACID, and here’s what we mean by Isolation”. Then talk about that isolation either using a standard term (e.g. “snapshot”) or describe it crisply in terms of the anomalies that developers need to worry about.
Totally agree that “isolation” can mean different things in different contexts, or for different implementations. But ACID is a term of art that establishes a context in which “isolation” is pretty precisely defined. I don’t think a system can claim ACID compliance with their own bespoke definition of “isolation”.
Maybe this is the disconnect.
Folks like Aphyr definitely (and correctly) believe that ACID is well-defined, and requires strict serializability, or something reasonably close to strict serializability. If other folks are trying to use ACID to mean something weaker than what it actually means, well, that results in articles like this one :) And, of course, much sadness and Sturm und Drang in production.
I think it’s reasonable for terms like ACID to evolve.
For example, the “D” originally meant a definition of durability that nearly nobody would be OK with today (based on Gray’s “A Transaction Model” from 1980): something like “on durable storage on one machine”. Nearly every production workload runs read replicas (at least!), and most productions systems support features like PITR. A vendor saying “eh, we lost writes because one machine failed” would stick out like a sore thumb in today’s market.
Same with serializability. It’s easy to say that weaker things are bad, but that’s not nearly the entire picture. For example, strict serializability requires coordination across readers and writers that weaker levels (like snapshot) don’t. That avoids some anomalies (write skew, in particular), but requires that developers are much more careful about how much their transactions read if they want decent scalability. I tend to believe that snapshot is probably the sweet spot here (write skew is relatively easy to reason about), and weaker levels don’t win as much on performance as they lose on correctness, but it’s a much more subtle debate than you’re implying.
I have been reading the Jepsen analyses since the beginning, and the overwhelming conclusion I have gained from them is: database vendors will say all sorts of things in documentation that are not true about the properties of the actual software.
If you want to make the case that we should be using a loose, informal definition of ACID well… I think you’re wrong, but you should at least be upfront about it. A database vendor is asking you to entrust them with your data. Making promises you can’t keep, even by implication, is wrong. I won’t trust a system from people who do that. I won’t use it.
Are there any modern DB vendors whose software does meet the claims?
I suggest that the implication of your question is the wrong framing, that it doesn’t matter what vendors claim because they are all lying. But if you read even a few of the other analyses on jepsen.io, you can see the difference between honest mistakes and subtle bugs and outright lying and faliure of due diligence. Just looking at a couple of these analyses we see, about CockroachDB:
About Cassandra:
Compare to Elasticsearch:
Or MySQL:
It’s quite easy, if you peruse these analyses, to separate databases that take their claims seriously from those that do not. So rather than reducing it to a simple binary of “do they meet their claims or not” it’s better to approach it with the attitude of, how do the vendors respond to the analysis? Good vendors take the analysis and run with it, at least patching obvious bugs, hopefully integrating Jepsen tests, updating documentation to reflect the subtleties. Bad vendors truck on without making meaningful changes, or even ignoring the results.
Woah there, Nelly. There are no implications of anything not written, let alone that “it doesn’t matter what vendors claim because they’re all lying”.
“Which if any modern db vendors are selling software that does what it says on the tin?” simply seems to me like a valuable question to ask.
Sorry for imputing the wrong sense.
I believe Postgres does what it says. Aphyr’s tests indicate that replication has interesting edge cases. I believe they are well-documented. But it is mostly a classic RDBMS that puts consistency over availability. As for modern hyperscale AP systems like RavenDB, I don’t use one so I can’t recommend one.
For sure! Look at Jepsen’s past analyses to get a sense of the landscape. IIRC FoundationDB does well, as an example.
I don’t think FoundationDB was ever Jepsen tested. There used to be this tweet by aphyr saying:
which supports your claim though.
And yet apparently RavenDB could say “eh, we lost writes but actually no machines failed,” and we have a conversation about whether the term ACID itself is the problem?
We totally agree on this stuff! Weaker semantics doesn’t mean the system is worse. (In fact, I think it usually means the system is better!)
The issue is just that these weaker systems don’t get to claim that they’re ACID-compliant. That’s all.
Atomic commitment seems to describe a property of a transaction, not an implementation, doesn’t it?
I’d say “atomicity” is a property of a transaction, and “atomic commitment” is a protocol-level problem that must be solved to achieve atomicity in sharded databases (which does not need to be solved in replicated single-primary databases, or single machine databases, to achieve atomicity).
Terms are hard, though.
When you say “single machine” do you mean “single threaded”? If not, can you say a bit more about why a single-machine and multi-threaded (i.e. concurrent) database doesn’t need to solve “atomic commitment”?
I’m not the original commenter obviously, but I read the comment as being about a problem to be solved at the protocol level between multiple writers that can experience network partitions. Replicated single-primary databases don’t have this problem because only one node can write and the readers will trivially always be, at minimum, eventually consistent in the event of a network partition. A multi-threaded database doesn’t have this protocol problem because threads on a single node cannot experience a network partition barring some bizarro supercomputer hardware architecture or something. Once you shard the database, you’ve introduced multiple writers and that creates this problem.
Yep, but I think this is the issue. I don’t think “atomic commitment” only applies to distributed systems subject to network partitions. I think it applies to any system that has concurrent readers and writers. Which includes single-node databases that are multi-threaded!