1. 37
  1. 6

    I would have liked to tag this article with datastructures but there is no such tag.

    1. 6

      compsci tends to be the correct goto for that stuff. :)

      1. 1

        On top of it, using CompSci neans that clicking the tag will teach them data structures, algorithms, optimizations, and more!

      1. 4

        At one of my first real jobs, my first project was to determine how much disk space we could save by implementing filesystem deduplication in our product. You guessed it – Bloom filters! At that time I had to implement the whole thing myself, it was fun.

        This post is great, although I would emphasize cuckoo filters much more. Cuckoo filters and derivatives have mostly replaced Bloom filters in many applications over the last couple years.

        1. 3

          If you think blooms are cool, check out Vacuum Filters[1]

          [1] https://www.vldb.org/pvldb/vol13/p197-wang.pdf

          1. 4

            Thanks! From your link I ended up finding yet-another-filter, the Morton Filter

            1. 5

              Wow, never imagined what a rabbit hole this topic is. I was starting with Bloom, went over to Ribbon, then to Cuckoo and now to Vacuum and Morton filters.

          2. 1

            Ooh, do count min sketch next!

            1. 1

              Because each key you need to store can fit into ~10 bits of data you can trade a small amount of memory to avoid overloading your backend systems.

              Where does this 10-bits figure come from?