1. 14
  1. 4

    See also http://dbmsmusings.blogspot.com/2012_05_01_archive.html by the same authors, Abadi and Thomson. It explains the motivation and main ideas of Calvin. The principle that ACID scales better if is strengthened (deterministic transaction order) rather than weakened (so-called NoSQL) is also discussed in an earlier post: http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html.

    There are several interesting ideas here: the unique global sequence of txns, deterministic execution of txns on replicas, pluggable storage back-ends, and prefetching relevant records from disk. These design choices reduce the time spent within the execution of transaction (e.g., avoiding coordinating between replicas using 2 phase commit).

    Reading the paper rang some bells in my head because I’ve been working on a distributed tuplespace (replacing ruby’s rinda) that uses the same replication techniques (except for prefetch, so far): https://github.com/vjoel/tupelo.

    1. 2

      One of the advantages of partitions is that partitions can make progress while a single replica set is down. It seems like this loses that property to a large extent. Imagine replica sets Alice, Bob, and Chuck. We have a transaction that modifies Alice and Bob, and many other transactions that modify Bob and Chuck. The Alice and Bob transaction wins, and then Alice becomes unavailable before Bob can be written to. It’s easy to imagine a backend with a transactional substrate that can be rolled back, and then the remaining blocked locks that don’t depend on Alice can be reordered. Is there a way of building this into Calvin so that we don’t restrict the underlying db more? Right now it seems like the underlying db also needs to be strongly consistent.

      1. 1

        I haven’t read the paper, just the blog posts. Maybe the idea is to have n replicas of each partition: nA, nB, nC. So when one A goes down, the A+B transaction still executes on (n-1)A and nB.

        What if all nA go down at the same time? It would be incorrect to reject the txn at the B nodes, because they have to work independently of A (txn execution has to be deterministic for this scheme to work). Maybe there’s a write-ahead log, which gets applied to the A nodes when they come back up. Edit: yes, there’s a log. Second sentence of http://sites.computer.org/debull/A13june/calvin1.pdf.

        1. 1

          Yes, if one replica goes down, the replica set can still make progress. All nA going down is exactly the case I’m wondering about. Unless I’m misreading the paper, it makes it seem like it stops progress until it can get back up and make progress again. It’s fine to continue the B part of the transaction, but it won’t let go of the lock until an A replica is able to get back up.

          1. 1

            I was thinking about this more, and it seems like it would be possible to modify the protocol in the way you’re suggesting, and it would also solve my concern about transactions. If A is down, keep holding the lock just for A–you can continue to make progress elsewhere as long as you discard the old synchronization plan, and remember which inputs must be sent to A. The main concern would be that your graph stops being a DAG, and that isn’t possible as long as the replica set is down. When the node comes up, you can finish the original transaction that it missed without problems. Or at least that’s how it seems to me. Good call!