1. 16

  2. 2

    I’m surprised the author doesn’t bring up loose index scans, which are potentially more efficient for count distinct queries. The idea is you look up the smallest (or largest value) and use a recursive CTE to lookup the next largest value. This gives a huge speedup when the number of distinct values is small, since you skip over all of duplicate values, and regular COUNT(DISTINCT ...) reads every single row.

    1. 1

      Is there a reason why this isn’t the preferred strategy that Postgres query planner employs when performing a COUNT(DISTINCT ...) on a btree-indexed numerical column? Seems to be better in the vast majority of cases (except when the table has high cardinality relative to rows).