1. 6

related VLDB paper http://www.vldb.org/pvldb/vol11/p1576-palkar.pdf


  2. 1

    Hm, I used to convert web logs to JSON records – one per line – and then use grep to do a pre-filter! It can filter out 90% or 99% of the lines that need to be filtered, and then you parse JSON to get the exact filter.

    grep is amazingly fast! This seems like the same idea taken a little further. I’ll have to look at how they do it in more detail.

    1. 2

      Section 7.2 of the paper actually uses grep/ripgrep as a basis of comparison. It seems the two have the same or better performance than Sparser, which still wins out by a small margin for the most selective queries.

      1. 2

        Yes. Always use grep first, even if one awk would do. This special one purpose only tool really cut the time down. Especially when you want to work on less than 100 million lines out of a billion lines.