1. 13

A Bloom filter is a simple yet powerful data structure. It allows you to answer the question have you seen this before? in a very fast and memory efficient way. Unlike a regular set, a Bloom filter only needs to store a few bits per item instead of storing the whole thing. The trick is, a Bloom filter will be able to tell you if something is not present in the set with 100% certainty, but if you ask it if something is present in the set, you might get a false positive. That means the response could be true, even if the item was never stored in the set.

    1. 2

      If you use k separate bitmaps, then h(1) cannot collide with h(2).

      That is h(i) should be (i * (N/k)) + ((f(x) + i * g(x)) % (N/k))

      pybloom and dablooms get this right, though a lot of other bloom filter implementations also get this wrong.

      1. 1

        Partitioned bloom filters have a little worse false positive rate, no?

        1. 1

          Nobody uses a perfect hash function with a bloom filter.