1. 3

  2. 2

    What I want to know is how it stands up to Jepsen.

    1. 1

      Looks pretty interesting. Some of the claims (e.g. adding nodes -> increased speed) seem eyebrow raising under paxos, but that’s what independent benchmarks are for, right?

      1. 4

        It reads to me like data is sharded out via hashing, and then updates are paxos’d around. Transactions appear to be OCC. I think the claim is that if you add more servers, your shards get smaller, and you could process more transactions per second on smaller shards.

        That said, I would have loved to have seen a paragraph or two that actually talked in more detail about the design and what tradeoffs were taken.

      2. 1

        Did anyone else have a negative reaction to the way the writing seems to be overly boastful?

        Unlike many other distributed CP data stores, GoshawkDB’s performance improves as you add more nodes.

        So if I add an infinite number of nodes, it’ll get infinitely faster? I guess it depends on how the word “performance” is interpreted. Presumably time to convergence will increase, so not specifying what aspect of performance is increasing is not very useful.

        [retry] is incredibly powerful and creates a substantial step-change in the power of GoshawkDB.

        Show me, don’t tell me.

        There’s also a lot of anti-SQL rhetoric on the Rationale page, which seems strawman like, i.e., why so much complaining about SQL? Tell me about why this key-value/object store vs. the many others?

        Also, the author writes:

        Early in 2015 I spent some time looking around several popular data stores and came to the conclusion that I didn’t want to use any of them. I wanted a data store that was distributed, fault tolerant, sharded, transactional and that puts semantics and guarantees front and centre.

        I’d love to see the author write up this evaluation: which stores were looked at? In what ways did they miss the mark?

        Looking around at the websites and documentation of many data stores, I couldn’t find anything that had the features I wanted or spelled out clearly their semantics: what bad things happen when failures occur? How do I code around those issues?

        Unless I totally missed it, the author doesn’t seem to have documented these things for GoshawkDB either.

        1. 1

          Did anyone else have a negative reaction to the way the writing seems to be overly boastful?

          Welcome to modern database development!