Looks pretty interesting. Some of the claims (e.g. adding nodes -> increased speed) seem eyebrow raising under paxos, but that’s what independent benchmarks are for, right?
It reads to me like data is sharded out via hashing, and then updates are paxos’d around. Transactions appear to be OCC. I think the claim is that if you add more servers, your shards get smaller, and you could process more transactions per second on smaller shards.
That said, I would have loved to have seen a paragraph or two that actually talked in more detail about the design and what tradeoffs were taken.
Did anyone else have a negative reaction to the way the writing seems to be overly boastful?
Unlike many other distributed CP data stores, GoshawkDB’s performance improves as you add more nodes.
So if I add an infinite number of nodes, it’ll get infinitely faster? I guess it depends on how the word “performance” is interpreted. Presumably time to convergence will increase, so not specifying what aspect of performance is increasing is not very useful.
[retry] is incredibly powerful and creates a substantial step-change in the power of GoshawkDB.
Show me, don’t tell me.
There’s also a lot of anti-SQL rhetoric on the Rationale page, which seems strawman like, i.e., why so much complaining about SQL? Tell me about why this key-value/object store vs. the many others?
Also, the author writes:
Early in 2015 I spent some time looking around several popular data stores and came to the conclusion that I didn’t want to use any of them. I wanted a data store that was distributed, fault tolerant, sharded, transactional and that puts semantics and guarantees front and centre.
I’d love to see the author write up this evaluation: which stores were looked at? In what ways did they miss the mark?
Looking around at the websites and documentation of many data stores, I couldn’t find anything that had the features I wanted or spelled out clearly their semantics: what bad things happen when failures occur? How do I code around those issues?
Unless I totally missed it, the author doesn’t seem to have documented these things for GoshawkDB either.
What I want to know is how it stands up to Jepsen.
Looks pretty interesting. Some of the claims (e.g. adding nodes -> increased speed) seem eyebrow raising under paxos, but that’s what independent benchmarks are for, right?
It reads to me like data is sharded out via hashing, and then updates are paxos’d around. Transactions appear to be OCC. I think the claim is that if you add more servers, your shards get smaller, and you could process more transactions per second on smaller shards.
That said, I would have loved to have seen a paragraph or two that actually talked in more detail about the design and what tradeoffs were taken.
Did anyone else have a negative reaction to the way the writing seems to be overly boastful?
So if I add an infinite number of nodes, it’ll get infinitely faster? I guess it depends on how the word “performance” is interpreted. Presumably time to convergence will increase, so not specifying what aspect of performance is increasing is not very useful.
Show me, don’t tell me.
There’s also a lot of anti-SQL rhetoric on the Rationale page, which seems strawman like, i.e., why so much complaining about SQL? Tell me about why this key-value/object store vs. the many others?
Also, the author writes:
I’d love to see the author write up this evaluation: which stores were looked at? In what ways did they miss the mark?
Unless I totally missed it, the author doesn’t seem to have documented these things for GoshawkDB either.
Welcome to modern database development!