A Bloom filter is a simple yet powerful data structure. It allows you to answer the question have you seen this before? in a very fast and memory efficient way. Unlike a regular set, a Bloom filter only needs to store a few bits per item instead of storing the whole thing. The trick is, a Bloom filter will be able to tell you if something is not present in the set with 100% certainty, but if you ask it if something is present in the set, you might get a false positive. That means the response could be true, even if the item was never stored in the set.
If you use
k
separate bitmaps, thenh(1)
cannot collide withh(2)
.That is
h(i)
should be(i * (N/k)) + ((f(x) + i * g(x)) % (N/k))
pybloom and dablooms get this right, though a lot of other bloom filter implementations also get this wrong.
Partitioned bloom filters have a little worse false positive rate, no?
Nobody uses a perfect hash function with a bloom filter.