The WHERE NOT EXISTS anti-join has always been my go-to approach to do this kind of thing (although in some cases I’ve had to use WHERE id NOT IN (SELECT ...) - not ideal but sometimes the only way if your ORM doesn’t cooperate). The LEFT JOIN .. WHERE NULL I’ve used a few times here and there, but it always felt a little bit kludgy. Good to know that it’s also considered as the right approach.
Enticing title… oh shit you’re doing spatial queries? cool… I’ve never heard of “Anti-Join” before, that’s a neat term… oh. Oh it’s just a Left Join. Oh, well, at least you provide a definitive answer with benchmarks but that was unexpectedly anticlimactic!
Anyone who’s done more than a little querying will know an anti-join by its results but fewer know the name, and it’s easy to reach instinctively for not in if you aren’t aware that the planner can do much better with not exists or the left join. Not exactly the Copernican revolution, but I’m pretty sure I’ve fallen into that trap myself within the past few months, so the reminder’s useful.
Ramsey’s part of the PostGIS core team, hence the spatial angle. It’d have been good to touch on spatial indexing in this imo but maybe that’s next on the agenda.
the LEFT OUTER JOIN … WHERE … IS [NOT] NULL (same as LEFT JOIN … WHERE … IS [NOT] NULL) can produce duplicate rows for the table on the left side, depending on the schema.
otoh, the WHERE [NOT] EXISTS ( [correlated subquery] ) guarantees exactly 1 row for each match on the left side.
in practice this can make a difference, at least in postgresql: the LEFT JOIN uses one of the various ‘join’ plans, whitle the EXISTS version use a ‘semi-join’ or ‘anti-join’ plans, which are similar but can be faster.
this is the SQL equivalent of breaking out of a ‘for’ loop using ‘break’ once you found a match.
The
WHERE NOT EXISTS
anti-join has always been my go-to approach to do this kind of thing (although in some cases I’ve had to useWHERE id NOT IN (SELECT ...)
- not ideal but sometimes the only way if your ORM doesn’t cooperate). TheLEFT JOIN .. WHERE NULL
I’ve used a few times here and there, but it always felt a little bit kludgy. Good to know that it’s also considered as the right approach.Enticing title… oh shit you’re doing spatial queries? cool… I’ve never heard of “Anti-Join” before, that’s a neat term… oh. Oh it’s just a Left Join. Oh, well, at least you provide a definitive answer with benchmarks but that was unexpectedly anticlimactic!
Anyone who’s done more than a little querying will know an anti-join by its results but fewer know the name, and it’s easy to reach instinctively for
not in
if you aren’t aware that the planner can do much better withnot exists
or the left join. Not exactly the Copernican revolution, but I’m pretty sure I’ve fallen into that trap myself within the past few months, so the reminder’s useful.Ramsey’s part of the PostGIS core team, hence the spatial angle. It’d have been good to touch on spatial indexing in this imo but maybe that’s next on the agenda.
the
LEFT OUTER JOIN … WHERE … IS [NOT] NULL
(same asLEFT JOIN … WHERE … IS [NOT] NULL
) can produce duplicate rows for the table on the left side, depending on the schema.otoh, the
WHERE [NOT] EXISTS ( [correlated subquery] )
guarantees exactly 1 row for each match on the left side.in practice this can make a difference, at least in postgresql: the
LEFT JOIN
uses one of the various ‘join’ plans, whitle theEXISTS
version use a ‘semi-join’ or ‘anti-join’ plans, which are similar but can be faster.this is the SQL equivalent of breaking out of a ‘for’ loop using ‘break’ once you found a match.