Weird end to the article. Using a single leader to “own” a Lamport clock as a solution to concurrent updates defeats ~most of the purpose of the pattern. (If you can have a leader responsible for clock ticks, you can just as easily have that leader serialize all the write operations in the first place!)
The fix for that is version vectors — essentially, tracking a set of (node ID, integer) tuples instead of just the integer.
The pattern with a single primary owning the clock has practical uses though. It is one way you can scale read load while maintaining causality.
Write operations would go to the primary, and in the response you get the clock time at/after your write. When you then go to read - anywhere else in the system - you include that clock time, and can be sure that you’re getting a view that happens-after your write.
But if you have a single write primary, there’s no need to use Lamport clocks at all: causality is established as a natural consequence of serializing writes.
In the described scenario clients are writing to a single primary and reading from multiple secondaries. The secondaries replicate the primary asynchronously. At any given point the secondaries may be arbitrarily far behind the primary. e.g.
Note that causality was violated between steps 3 and 4: the client read an earlier value than it wrote.
If a client knows it’s only interested in values with clock>=2 then at step 4 it can choose to ignore the (A, 1) it read and try some other strategy: wait briefly; retry the same secondary; try a different secondary; read from the primary. (The goal here being to have most reads satisfied by the secondaries instead of the primary.)
Another causality violation:
In this scenario the client sees an earlier state after having already seen a later state (because it’s round-robining or something). Again the clock lets it detect that and possibly do something about it.
Well, sure: switching arbitrarily between replicas that have no synchronization guarantees can easily produce causality violations. But if you cared about causality, you [probably] wouldn’t let that happen in the first place.
You’re going to, because the whole point is to scale up reads.
The client may not have the option of sticking to the same secondary (e.g. the one it was reading from just went down for maintenance).
The client doesn’t get read-your-writes consistency even with only one secondary, as described in the first example I gave.
The client may want to issue requests in parallel to multiple secondaries.
All these things are possible, sure, but they make a lot of assumptions about use case and architecture which are far from universal, or even particularly common.
i.e. if you need read-your-writes, there’s many better options for scaling than this one — and little reason to use a single write primary! ;)
The thing that is bugging me is that I don’t know precisely what your software would do when you say it “wouldn’t let that happen in the first place” about switching replicas. Are you alluding to a particular documented protocol?
This conversation feels a little bit goalpost shift-y, which I’m sure is not your intention. But one minute we’re talking about maintaining “causality”, the next it’s “if you need read-your-writes…”.
edit to add: Is the word “causality” getting used with a highly precise technical definition that I’m not aware of or something, perhaps?
Causality is not precisely defined, but read-your-writes is, and it is both a stronger and more specific property of a system. You can have causality without read-your-writes.
I think the “Partial Order” paragraph is at least misleading. A Lamport clock does provide a total order on all events (provided a processor ID is concatenated to the local timestamp at each processor). This total order is compatible with the causal partial order of events, but the implication only goes one way: a pair of causally ordered events are guaranteed to have Lamport clock values which respect that ordering, but two events with different Lamport clock values are not necessarily causally ordered at all. In fact, there is no way to infer from Lamport clock values whether two events are causally ordered or concurrent. (That is the whole point of vector clocks.)
Concatenating a replica ID with the Lamport clock turns it into a vector clock, a related but different thing.
Not quite sure what you mean by that, but Lamport originally proposed concatenating a process ID (not quite the same as a replica ID) to the Lamport timestamp in order to provide a total order across events with the same local timestamp but generated by different processes. Vector clocks, on the other hand (and again I’m distinguishing between process IDs and replica IDs), use the ordinal position of a timestamp within a vector to indicate its process ID.
Quoting from “Time, Clocks, and the Ordering of Events in a Distributed System”: “We simply order the events by the times at which they occur. To break ties, we use any arbitrary total ordering < of the processes”. Concatenating the process ID to the process’s local timestamp is precisely such a tie-breaking mechanism based on an arbitrary total ordering of the processes.