Threads for russelldb

  1. 4

    Amazing how the author can talk at length on CRDTs, and even have a section called “What are CRDTs”, yet never actually spell out “conflict-free replicated data type”.

    1. 4

      Yeah! An oversight (I link to CRDT.tech though!) Part of it is that the acronym has been overloaded over time. Conflict Free, Convergent, Concurrent, Commutative, etc. I have just opened a PR that addresses this comment.

      I don’t think it is that amazing, when you consider I say that the article is not about CRDTs, and won’t be explaining them, and links to sources. But I appreciate 1. that you read it (thanks!) and 2. took the time to comment. I have addressed that comment in a PR, and I hope our marketting team can merge+publish when America comes online.

      EDIT: to say, we have fixed that oversight. Many thanks!

    1. 2

      I really wish someone would port Hypothesis’ stateful testing to Rust’s proptest or quickcheck (https://hypothesis.works/articles/rule-based-stateful-testing/).

      Consider a map/dictionary.

      With the style of testing presented here in the linked article (https://medium.com/@tylerneely/reliable-systems-series-model-based-property-testing-e89a433b360), you basically say “here’s a random series of inserts and deletes; hopefully some deletes line up with the inserts”. With Hypothesis style testing, you can generate deletes knowing what entries the map has.

      1. 5

        In the article I point out that there is a PR open on proptest (https://github.com/AltSysrq/proptest/pull/257) that adds something very like the EQC Statem, stateful property based testing. This means you can infact generate deletes that are in the model.

        In the case in the article, testing sets, we instead use low cardinality that ensures collisions over time. Not included in the article is that the test also gathers statistics about which operations were run, what percentage of removes were meaningless etc.

        I have since ported the test to use proptest stateful testing, and it was equally effective, and much nicer to write (especially for someone who still prefers EQC Statem!)

        1. 2

          Oh cool I missed that. I’m excited!

        2. 1

          It took me a while to realize that what you and the articles are talking about is called Simulation Testing in other circles. Agreed, those are very worthwhile tests to have.

        1. 2

          Are CRDT counters idempotent? (Or, do CRDTs always have to be idempotent?)

          Because, I don’t understand how they can be. If they are and anyone can point me towards some writing or any implementation then please do.

          1. 9

            It depends. CRDT counters as implemented in Riak are not idempotent, but for good reason.

            If every client was an actor in a PN-Counter, and you wanted every game player of your mobile ‘phone game to increment the count, you would have a very wide PN-Counter (an entry per actor.) If these clients all read a CRDT from a database with a RYOW consistency level (or store their own writes locally, durably), then you can have idempotent counters, by dint of the fact that the merge function (the LUB) of a CRDT is idempotent.

            Carlos Baquero et al did some work around this limitation in this paper: http://arxiv.org/abs/1307.3207

            In Riak, we bound the number of actors to be the replica vnodes for a data item. This means that many, many, clients send operations to Riak (inc/dec.) without changing the size of the counter. The downside is “counter drift.”

            If Riak receives an operation, and executes it on a vnode, but fails to replicate to W-1 vnodes, or if Riak succeeds, but the client never receives an “OK!” Then the operation is a partial success. This is indistinguishable from an actual failure from the client’s perspective. If the client resends the operation and there was a failure, OK! But if there was a partial success then we double counted. This is positive counter drift. If the user choses not to resend the op and there was a real failure, this is negative drift.

            How to make this system model idempotent is the subject of current work. Trivially, you can make an idempotent counter CRDT like this:

            • Client creates a unique ID for an operation (say {id: 100, op=12} means id=100, increment by 12)
            • A Set on the database stores this
            • The client can resend as many times as it wants, that id, op pair will only be stored once.
            • Set Union is idempotent.
            • Set insertion is idempotent.
            • Value of the counter is sum of all ops.

            If the client never re-uses the id, it can replay the op safely. Downside? That’s going to be a big set.

            I’ve seen implementations where an ID lasts for some period of time, and is then dropped. And so you keep a set of ids + some roll up integer per actor, and you’re idempotent for failures that last for some period of time, after which retrying will lead to drift.

            In general the update/mutate function on state based CRDTs are not idempotent, but the merge function is. If you send CRDTs between nodes in your system (including the clients) and you can always RYOW (either client local storage, or from a database) then you have idempotent operations.

            1. 2

              Usually they are not. At least the ones implemented by Basho for Riak are not.

              1. 7

                Yes, the operations performed over CRDTs are always idempotent. Riak’s counters (implemented here AFAIK) are state-based PN-counters, described in Section 3.1.3 of the Shapiro et al. CRDT techreport.

                Each actor (or client, in the case of Riak) maintains its own count of increment and decrement operations; upon read, these are merged to yield the current value of the counter. The idempotency isn’t that an actor can issue an increment multiple times, and only the first will be acknowledged; it’s that one actor’s impact on the counter is not applied multiple times despite repeated replications / merging of its state.

                If you do want actors to only be able to increment once, then you can build a CRDT counter comprised of two sets, one to track increments, the other to track decrements; getting the counter’s value requires taking the difference of the sets' cardinalities. An increment in “userland” would add an element to the increment set with a tag uniquely identifying the actor, likewise for decrements.

                (I don’t work for Basho, so perhaps someone who does can correct me if I’m wrong somehow.)