1. 37
  1.  

  2. 19

    The last footnote includes the conclusion for practical use:

    “To be fair, the asymptotic behaviour of Bloom’s original bound is consistent with this updated definition, so the impact is more on an issue of pedantry rather than for practical applications.”

    1. 15

      Whilst the conclusion isn’t as groundbreaking as the title would suggest, I have to commend this article for how well written it is.

      Academic writing is often quite hard to follow without giving a very deep reading. The fact this article manages to explain the core of some novel research in a blog post format and remain understandable is impressive to me.

      1. 3

        That precise formula/assumptions is nice and all, but Boom’s original 1970 paper presented a different data structure/technique using vanilla linear probing and B-bit fingerprints. That uses approximately (1 + log{1/p}(N)) times more space, where p is false pos rate, but saves a lot of time on modern memory architectures by being only 1 main memory load. That same original paper even includes a footnote about how its (asymptotic) analysis depends upon what would be viewed today as very locality/cache-naive assumptions. So, the real “bunk”, if that is even fair since the paper itself acknowledges it, is probably using the many-hash function idea at all.

        1. 3

          I agree with @sdhand but I do decry this compunction academics feel to click-bait all their work. I suppose we got here by having a dog-eat-dog grant, winner take all application process, instead of giving everyone a little bit of money so they could all just go and research something dear to their hearts.

          1. 8

            I agree. What we need is Universal Basic Grant.

          2. 2

            This proxy technique, which safe-guards the privacy of millions of users ever day

            Far from the only thing protecting privacy there. Some browsers (GNOME Web) actually just download the whole database (which is very fun on a metered mobile connection hehehe), but the mainstream solution is to use the v4 update API which is based on hash prefix matches, kinda similar to pwned passwords.

            I wonder if all of this could be replaced with a CRLite style cascade of bloom filters..