At one of my first real jobs, my first project was to determine how much disk space we could save by implementing filesystem deduplication in our product. You guessed it – Bloom filters! At that time I had to implement the whole thing myself, it was fun.
This post is great, although I would emphasize cuckoo filters much more. Cuckoo filters and derivatives have mostly replaced Bloom filters in many applications over the last couple years.
Wow, never imagined what a rabbit hole this topic is. I was starting with Bloom, went over to Ribbon, then to Cuckoo and now to Vacuum and Morton filters.
I would have liked to tag this article with datastructures but there is no such tag.
compsci
tends to be the correct goto for that stuff. :)On top of it, using CompSci neans that clicking the tag will teach them data structures, algorithms, optimizations, and more!
Related story: Ribbon filter: practically smaller than Bloom and Xor
At one of my first real jobs, my first project was to determine how much disk space we could save by implementing filesystem deduplication in our product. You guessed it – Bloom filters! At that time I had to implement the whole thing myself, it was fun.
This post is great, although I would emphasize cuckoo filters much more. Cuckoo filters and derivatives have mostly replaced Bloom filters in many applications over the last couple years.
If you think blooms are cool, check out Vacuum Filters[1]
[1] https://www.vldb.org/pvldb/vol13/p197-wang.pdf
Thanks! From your link I ended up finding yet-another-filter, the Morton Filter
Wow, never imagined what a rabbit hole this topic is. I was starting with Bloom, went over to Ribbon, then to Cuckoo and now to Vacuum and Morton filters.
Ooh, do count min sketch next!
Where does this 10-bits figure come from?