1. 13

A Bloom filter is a simple yet powerful data structure. It allows you to answer the question have you seen this before? in a very fast and memory efficient way. Unlike a regular set, a Bloom filter only needs to store a few bits per item instead of storing the whole thing. The trick is, a Bloom filter will be able to tell you if something is not present in the set with 100% certainty, but if you ask it if something is present in the set, you might get a false positive. That means the response could be true, even if the item was never stored in the set.

  1.  

  2. 2

    If you use k separate bitmaps, then h(1) cannot collide with h(2).

    That is h(i) should be (i * (N/k)) + ((f(x) + i * g(x)) % (N/k))

    pybloom and dablooms get this right, though a lot of other bloom filter implementations also get this wrong.

    1. 1

      Partitioned bloom filters have a little worse false positive rate, no?

      1. 1

        Nobody uses a perfect hash function with a bloom filter.