1. 27
  1.  

  2. 6

    HyperLogLog is a very cool algorithm, glad to see Redis implement it. We recently added HLL to Elasticsearch as well.

    If you are unfamiliar with HLL, I highly encourage everyone to read more about it. Aggregate Knowledge has an excellent series of articles about it (and many other sketches), including fun tricks such as HLL intersections and merging HLLs of different register sizes.

    I honestly think the future of “big data” will be approximation engines. At some large quantity of data, it becomes irrelevant if your analytics are incorrect by 0.5% (assuming you aren’t a financial institute or something). Approximations are fast, memory friendly and get you an answer that is entirely usable for 99% of tasks. Even better, most approximation algorithms can give you an estimate of error, so you can gauge how incorrect the approximation likely is.

    1. 3

      Excellent application of “massage the data until it’s pretty” at the end. :)