You mention approximate LFU as a category. TinyLFU is fun because it seems to be able to successfully avoid retaining keys you aren’t going to use much while keeping memory use small with a probablistic sketch, and doesn’t have patent issues.
Couple interesting things related to that are Ristretto, a Go TinyLFU cache library inspired by the popular Caffeine cache in Java. Damian Gryski did a standalone implementation. (He implements a lot of algorithms from papers–interesting to follow if you like that.)
Ristretto’s authors were going for production-worthy performance; they’re at DGraph and they’d like to use it in their database product. Their blog posts shows their tests of how their policy performs vs. some others on some traces of real traffic, which is neat. They also had to deal with practical things like high concurrency, and did some interesting or odd things (depending on your perspective) using sync.Pool to hit their targets.
Couple other interesting links on TinyLFU are a page on the Caffeine wiki (which, like the Ristretto post, has results of simulations using trace data) and a blog post with pictures.
I’m glad you brought this up! I’m not into self-promotion, but I’m the primary author of Ristretto and it seems relevant. I spent 6-8 months or so talking with Ben Maines (of Caffeine) and determining the best design of a modern Go cache.
OP’s article is pretty good. I think it’s a good overview of the current state of cache replacement policies. The only thing I would add is the throughput/hit ratio balance. Sometimes throughput matters more than hit ratio, and vice versa.
Also, once you exhaust replacement policies there’s the admission policies and that’s where TinyLFU comes in. We actually use TinyLFU as the admission policy and replacement policy in Ristretto (sampled LFU replacement using TinyLFU counters). The LFU/LRU hit ratios are very dependent on the datasets, check out the Benchmarks.
You’re spot on with the sync.Pool being interesting/odd - I think that’s probably one of my favorite discoveries of that project. We really had a hard time dealing with Go’s concurrency model while trying to make concurrent data structures. I would find all these cool concurrent hashmap implementations in academic papers only to realize they relied upon processor/thread local storage (which makes sense, in hindsight; I was looking up wait-free and lock-free hashmap implementations). It was disappointing that Go maintainers refuse to expose anything like that, while using processor-local storage in their sync.Pool implementation. So, I essentially figured out how to abuse sync.Pool to to get the kind of throughput performance we desired. I’m going to explore that pattern some more, I think there are other applications that could benefit from it.
I think I left the project sort of jaded though. I don’t think a fully concurrent cache is the way to go, at least in Go. It might be if we had the same atomic controls as in C/C++, but we likely never will. You end up spending more time dealing with concurrency issues than you do the cache itself, and the throughput will never be able to compete with C++/Java. Though, projects like Doppio are promising and achieve 7-10x throughput with Ristretto compared to Redis. It requires a bit more orchestration to scale single threaded services like Redis to take advantage of the hardware.
Hey, thanks for reading the post and thanks for the all the resources.
I mean to look into Ristretto next!