1. 11

This project isn’t brand new, but I recently discovered it and even deployed it in production for a difficult-to-satisfy use case.

Bloom filters are a handy data structure, and implementations tend to exist for every language. But many use cases require scalable (resizable) bloom filters, persistence, and an efficient server protocol – think clusters that do web crawling or analytics.

bloomd can make these applications very simple. The C code is also clearly-written. The Python driver also includes a naive round robin sharding scheme and multi-server support. Definitely worth looking at!

  1.  

    1. 3

      The daemon is Redis-like in its speed and wire protocol, but not in its overall design. For example, bloomd persists each bloom filter as a file on-disk. It can use mmap'ed files or just do a 60-second flush interval, or whatever you like. It also implements the scaling and resizing logic automatically – similar to the Lua scripts you linked to, but this is not how the pyreBloom implementation works.

      It took me two hours to go from “I need a bloom filter server” to “it’s running in production and doing 3,000 lookups per second, with simple persistence and monitoring built-in.”

      That said, I agree that Redis should actually get a first-class bloom filter – that’d be nice!

      1. 1

        Cant compete with 3 hours. ;)

        I poked around a little bit and Redis looks like it doesn’t have some features that would make this a fit for your application. Like being able to do replication at a per-database level instead of the whole Redis instance. As it currently stands, one would need to have a separate Redis instance per bloom filter. Totally doable, but kinda a pain.

        If your application was read heavy, I’d setup a single write master (with save slave) and multiple read only slaves.