Hmm…the lock-free-ness is aesthetically nice I suppose, but unless there are some heavy performance constraints (which I don’t think would apply for the initially-cited use-case of incrementing a counter on each service start), I’d be more inclined to skip compiling and installing a special-purpose binary and just hack it together in shell with a lock file:
while ! (set -C; echo "$$" > "$1.lock") 2>/dev/null; do
sleep $interval # tune to taste
echo "$newval" > "$1"
rm -f "$1.lock"
I suspect this approach is also much more likely to work on NFS.
Thanks for sharing. I did not know the “set -C” trick yet. I am not sure though if it will work over NFS because of caching timings, the default NFS client setting is to check only every 3 seconds for new files (you need the “noac” parameter for mounting to avoid caching). I’m not sure though if it applies to it.
In my experience nowadays proper file locking with lockf() works very well with NFS.
Otherwise just discovered the flock command, part of util-linux package on Ubuntu.
I don’t think client metadata cache timeout parameters should enter the picture at all – it’s a matter of the client specifying the mode argument of the CREATE call as EXCLUSIVE (see https://www.ietf.org/rfc/rfc1813.txt, section 3.3.8).
EDIT: and incidentally, the matter of the content of the counter file (not the existence of the lock file) being consistent between clients (which is also important in this scenario) should be handled by close-to-open consistency semantics, which I think should be provided by any modern NFS implementation unless you’ve explicitly configured it to behave otherwise.
The author was going for a lock free design, but doesn’t the infinite loop in increment counter look very similar to a mutex implementation?
It’s not a rhetorical question. I’m not an expert in this area but it seems like odo might be reimplementing a lock on top of lock free atomic operations which seems to be how people actually implement mutexes.
Lock-freedom requires that the system as a whole makes progress, even if an evil scheduler is trying its hardest (permanently descheduling threads etc) to prevent that. In that loop, the CAS either suceeds, or it fails because another thread has already incremented the counter. In either case, the system has made progress.
There is also wait-freedom, which requires that every operation terminates in a bounded number of steps, no matter how evil the scheduler is. There is an algorithm for turning non-concurrent data structures into concurrent wait-free data structures, but obviously it’s worthless. Sadly, producing usable wait-free algorithms requires at least deep wizardry from the author.
(I am currently taking a concurrent algorithms class)
Yes, it’s a kind of spinlock, but one that doesn’t need a context switch to the OS. My thinking was that will rarely need more than a few retries, so the costs of busy-waiting should be pretty low compared to a mutex.
Yes, the worst case of n iterations is when there are n other writers trying to increment at the same time. All waiting threads are guaranteed to never iterate more than once without some other thread completing its compare-and-swap. Unlike a spinlock where the lock is independent of the actual work done.
Sort of, but no. Notably, the worst case of n iterations is when there are n other writers trying to increment at the same time. All waiting threads are guaranteed to never iterate more than once without some other thread completing its compare-and-swap. Unlike a spinlock where the lock is independent of the actual work done.
And so on. In theory, other waiting threads could spin an arbitrary number of times before A is scheduled to do work. In a lock-free system, a second attempt by ANY thread involved is guaranteed to make progress, because the only thing barring progress is the progress of another thread (i.e. the value has changed.), much like mikejsavage said.
The only time where this can be slower than mutexes is when the cost of computing the new result to compare-and-swap is large. Namely, that a read -> compute -> CAS cycle incurs a context switch such that more (or all) of the threads in the system can read and compute a new value only to have it invalidated. Otherwise, the number of threads that can waste a computation at any given time is limited to the number of cores on the system.
I think it’s possible to build this in pure sh so that it even works across the network.
ln counter lock
echo ((‘cat counter’ + 1)) > new
mv new counter
If I run this as-is, it fails with “ln: lock: File exists” if the lock is currently claimed and then attempts to increment it anyway. Adding ‘set -e’ makes it exit at the error, rather than waiting and updating the counter after it’s released. Am I missing something?
Not missing anything except the code I didn’t write. It was too hard putting a loop around it while typing on my phone. :) You’d want something like while ! ln counter lock I think. cat lock may also be safer with NFS to prevent stale reads; I forget the exact semantics. ln and mv are required to be atomic though.
Gotcha. My C code has a bit more polish (optionally printing the atomically incremented value, checking that it’s not clobbering a non-counter file, etc.) but the shell script covers the main use cases. Doing it in C was a learning exercise, though.