Examples of database that use eventual consistency are
DynamoDB
Sort-of. DynamoDB is interesting here: it’s actually a strongly consistent database (in that it’s fundamentally “CP”) but will allow readers to opt in to eventually consistent reads as a latency optimization. Writes are always synchronously replicated to a quorum of replicas, and strong consistency is always available to readers. In much the same way, Aurora PostgreSQL allows read replicas which are eventually consistent, while still ensuring that all writes are strongly consistent and ordered and written to a quorum.
Similarly to Raft, Paxos protocols are (in terms of CAP theorem) “CP”. Which means the cluster can also become unavailable.
The cluster can always become unavailable in some conditions, no matter what your software does. The only difference, really, between “CP” and “AP” systems is whether it is available on the minority side for writes and strongly consistent reading during a network partition. Nothing is stopping a “CP” system from being available and consistent on the majority side during a partition, but it has to be unavailable on the minority side.
since all writes must have a quorum to be accepted (which implies back-and-forth between the leader and followers), performance can be degraded
You don’t mention durability here, which is a key part of the picture if you’re going to decide whether or not to involve multiple servers in a write. In fact, if you weren’t worried about durability, you could invent a modified version of Raft with significantly better average-case latency properties, but with some probability of data loss on single-machine failure.
DynamoDB is interesting here […] will allow readers to opt in to eventually consistent reads
Indeed, I missed the part about “read consistency”.
The only difference, really, between “CP” and “AP” systems is whether it is available on the minority side for writes and strongly consistent reading during a network partition.
That is what I actually wanted to say though my formulation was inaccurate. In the use case of FlowG, we write more often than we read, so “write availability” is more important than “read consistency”. We want to ingest logs as fast as possible, and we trust that the pipeline will store them in the right place and/or trigger the correct webhooks, there are better tools than FlowG to actually view the data (Kibana for instance), which is why I actually use FlowG to forward logs to other systems (Datadog, Splunk, ElasticSearch, Zapier, you name it), and only store in FlowG the minimal amount of data.
You don’t mention durability here, which is a key part of the picture
Indeed, that’s a miss on my part, thanks.
Thanks a lot for the corrections! I’ll add them as notes to the article tomorrow evening :)
How does the compression compare to openobserve? I found that the compression in the parquet files is quite amazing. It doesn’t seem like badgerdb does anything other than native compression. Does the columnar nature of parquet make things smaller?
We enabled compression in BadgerDB using its native feature with the ZSTD compression algorithm. The data model (described here) contains only textual data, which compresses well.
Though, we did not run enough benchmark to have a definitive answer, nor a comparative one (the project is still in its infancy). The (very dumb) benchmark I did was to generate tens of millions of logs to a single FlowG instance. Taking the logs in at a rate of ~4000 log/s, during the ingestion the database compressed a few times (and we notice a drop in performance during this step), and the final result was something in the range of 200-300MB. But I’m not entirely sure about the accuracy of those numbers, the generated logs were very similar in their content, and the test was run on my personal computer (with Chrome and VS Code taking their big fat chunk of CPU time and RAM usage).
I plan to improve our benchmarking once the few really important features are well implemented (replication being one of the last).
See also: Enough With All The Raft and the Data Replication Design Spectrum.
Sort-of. DynamoDB is interesting here: it’s actually a strongly consistent database (in that it’s fundamentally “CP”) but will allow readers to opt in to eventually consistent reads as a latency optimization. Writes are always synchronously replicated to a quorum of replicas, and strong consistency is always available to readers. In much the same way, Aurora PostgreSQL allows read replicas which are eventually consistent, while still ensuring that all writes are strongly consistent and ordered and written to a quorum.
The cluster can always become unavailable in some conditions, no matter what your software does. The only difference, really, between “CP” and “AP” systems is whether it is available on the minority side for writes and strongly consistent reading during a network partition. Nothing is stopping a “CP” system from being available and consistent on the majority side during a partition, but it has to be unavailable on the minority side.
You don’t mention durability here, which is a key part of the picture if you’re going to decide whether or not to involve multiple servers in a write. In fact, if you weren’t worried about durability, you could invent a modified version of Raft with significantly better average-case latency properties, but with some probability of data loss on single-machine failure.
Indeed, I missed the part about “read consistency”.
That is what I actually wanted to say though my formulation was inaccurate. In the use case of FlowG, we write more often than we read, so “write availability” is more important than “read consistency”. We want to ingest logs as fast as possible, and we trust that the pipeline will store them in the right place and/or trigger the correct webhooks, there are better tools than FlowG to actually view the data (Kibana for instance), which is why I actually use FlowG to forward logs to other systems (Datadog, Splunk, ElasticSearch, Zapier, you name it), and only store in FlowG the minimal amount of data.
Indeed, that’s a miss on my part, thanks.
Thanks a lot for the corrections! I’ll add them as notes to the article tomorrow evening :)
Edits are done, thank you again :)
How does the compression compare to openobserve? I found that the compression in the parquet files is quite amazing. It doesn’t seem like badgerdb does anything other than native compression. Does the columnar nature of parquet make things smaller?
We enabled compression in BadgerDB using its native feature with the ZSTD compression algorithm. The data model (described here) contains only textual data, which compresses well.
Though, we did not run enough benchmark to have a definitive answer, nor a comparative one (the project is still in its infancy). The (very dumb) benchmark I did was to generate tens of millions of logs to a single FlowG instance. Taking the logs in at a rate of ~4000 log/s, during the ingestion the database compressed a few times (and we notice a drop in performance during this step), and the final result was something in the range of 200-300MB. But I’m not entirely sure about the accuracy of those numbers, the generated logs were very similar in their content, and the test was run on my personal computer (with Chrome and VS Code taking their big fat chunk of CPU time and RAM usage).
I plan to improve our benchmarking once the few really important features are well implemented (replication being one of the last).