1. 5
  1. 3

    Awesome work! Thanks for sharing. We have a similar infrastructure at NewsCred. We export data out of BigQuery (collected by Google Analytics), then aggregate it in Spark and use Redis for HLLs, which an HTTP API service queries and returns. The downside is adding more keys uses more memory and Redis becomes expensive. Spark now has an interoperable HLL format in a library they’ve published. A side project of mine I haven’t gotten around to yet would be to get Spark to output this to Postgres which supposedly should work with this HLL format.