The WHERE NOT EXISTS anti-join has always been my go-to approach to do this kind of thing (although in some cases I’ve had to use WHERE id NOT IN (SELECT ...) - not ideal but sometimes the only way if your ORM doesn’t cooperate). The LEFT JOIN .. WHERE NULL I’ve used a few times here and there, but it always felt a little bit kludgy. Good to know that it’s also considered as the right approach.
WHERE NOT EXISTS
WHERE id NOT IN (SELECT ...)
LEFT JOIN .. WHERE NULL
Enticing title… oh shit you’re doing spatial queries? cool… I’ve never heard of “Anti-Join” before, that’s a neat term… oh. Oh it’s just a Left Join. Oh, well, at least you provide a definitive answer with benchmarks but that was unexpectedly anticlimactic!
Anyone who’s done more than a little querying will know an anti-join by its results but fewer know the name, and it’s easy to reach instinctively for not in if you aren’t aware that the planner can do much better with not exists or the left join. Not exactly the Copernican revolution, but I’m pretty sure I’ve fallen into that trap myself within the past few months, so the reminder’s useful.
Ramsey’s part of the PostGIS core team, hence the spatial angle. It’d have been good to touch on spatial indexing in this imo but maybe that’s next on the agenda.
the LEFT OUTER JOIN … WHERE … IS [NOT] NULL (same as LEFT JOIN … WHERE … IS [NOT] NULL) can produce duplicate rows for the table on the left side, depending on the schema.
LEFT OUTER JOIN … WHERE … IS [NOT] NULL
LEFT JOIN … WHERE … IS [NOT] NULL
otoh, the WHERE [NOT] EXISTS ( [correlated subquery] ) guarantees exactly 1 row for each match on the left side.
WHERE [NOT] EXISTS ( [correlated subquery] )
in practice this can make a difference, at least in postgresql: the LEFT JOIN uses one of the various ‘join’ plans, whitle the EXISTS version use a ‘semi-join’ or ‘anti-join’ plans, which are similar but can be faster.
this is the SQL equivalent of breaking out of a ‘for’ loop using ‘break’ once you found a match.