I’m the author of that post. I would be happy to answer questions, and would appreciate any feedback people have to offer.
Assuming you can’t control what times the clients start at? What triggers many clients to try writes at the same time?
That depends on the application. The “everybody starts at the same time” case is interesting, because it’s the worst case. Typical internet-facing service traffic is very bursty, so while this is a degenerate case, it’s not one that’s really too atypical.
I’ve seen a thundering herd effect from smoke tests (to verify that live environment was working correctly) that resulted in several caches expiring at the same time and causing a (small) regular latency spike every 5 minutes in production. We solved it by adding jitter, though we referred to it as “fuzz”.
You can also do soft expiry using an exponential function so early expiry is rare but you still don’t get contention with very large numbers of clients. Something like 10 ** (t - timeout - rand()).