I remember using Postgresql years ago and that I had trouble finding a canonical way of escaping user terms. E.g. Include a quoted user provided term in a more complex full text search query. Maybe there exist an easy answer but I did not find it.
Probably shouldn’t have commented. It was around 9 years ago.
I wanted to insert a user provided term into a premade query string. Fully quoted so that it means “exact match”. As far as I remember, I did not find documentation on how to quote safely in the query language and no function to do so.
Edit : I mean quoting inside a query string and not on the SQL level.
There’s a builtin function for ‘simple’ search (which is almost what you’ve asked for), but there’s still no ‘just find documents that match this string even if it has query directives in it’.
It assumes that if we want to provide more complex queries to the user, they have to use one of the provided syntaxes on the database level. If you want to translate from a different query language, you really need quoting.
Probably easy to provide inside of Postgresql as a function.
It’s not weird. Full Text Search works because it tokenizes & stemmes complex documents. Then performs a search on that.
This way words:jump jumping jumped all get indexed as ‘jump’. Then any search for either form will match. Searching verbatim how it stands in text, including all the stop words that get evicted during parsing/stemming would counter all the performance you gain from full text search and at that point you just perform regular string matching.
That’s not what either me or pkolloch are talking about.
to_tsquery parses its inputs, which are interpreted in the postgres tsquery language. There is no builtin input-escaping function for that language. As far as I know, the characters :|*'&() have special meaning, but there may be others.
You can use plainto_tsquery if you only have user input, but if you want to combine user input with non-user input for powerful searches, you are going to run into trouble quite rapidly.
which can safely interpolate and quote user provided inputs. Slightly less safe (easier to misuse) are all the available string manipulation functions from the quote_ family and format (https://www.postgresql.org/docs/11/functions-string.html).
Additionally most good database drivers handle proper quoting themselves (like psycopg2 for python) - there are a lot of gotchas but in all cases that has been a solved problem.
Some key insights when doing full text search with GIN and RUM indexes. Really nice write up. Thanks for sharing.
Also like the explain analyze everything in the article.
Same. I find having an example using a tool I’m familiar with really helps drive the point home.
Fuzziness warning:
I remember using Postgresql years ago and that I had trouble finding a canonical way of escaping user terms. E.g. Include a quoted user provided term in a more complex full text search query. Maybe there exist an easy answer but I did not find it.
Not sure I understand your requirements but were you looking for something like websearch_to_tsquery which translates a Google like query ie.
into
which means find documents with signal present and segmentation followed by fault not present.
https://www.postgresql.org/docs/11/textsearch-controls.html
It is quite new (appeared in PostgreSQL 11) so it might have not existed when you last checked.
The <-> (followed by) operator appeared in PostgreSQL 9.6
Probably shouldn’t have commented. It was around 9 years ago.
I wanted to insert a user provided term into a premade query string. Fully quoted so that it means “exact match”. As far as I remember, I did not find documentation on how to quote safely in the query language and no function to do so.
Edit : I mean quoting inside a query string and not on the SQL level.
I have run into this fairly recently.
There’s a builtin function for ‘simple’ search (which is almost what you’ve asked for), but there’s still no ‘just find documents that match this string even if it has query directives in it’.
A bit weird, isn’t it?
It assumes that if we want to provide more complex queries to the user, they have to use one of the provided syntaxes on the database level. If you want to translate from a different query language, you really need quoting.
Probably easy to provide inside of Postgresql as a function.
It’s not weird. Full Text Search works because it tokenizes & stemmes complex documents. Then performs a search on that.
This way words:jump jumping jumped all get indexed as ‘jump’. Then any search for either form will match. Searching verbatim how it stands in text, including all the stop words that get evicted during parsing/stemming would counter all the performance you gain from full text search and at that point you just perform regular string matching.
That’s not what either me or pkolloch are talking about.
to_tsquery
parses its inputs, which are interpreted in the postgres tsquery language. There is no builtin input-escaping function for that language. As far as I know, the characters:|*'&()
have special meaning, but there may be others.You can use
plainto_tsquery
if you only have user input, but if you want to combine user input with non-user input for powerful searches, you are going to run into trouble quite rapidly.You want prepared statements
https://www.postgresql.org/docs/11/sql-prepare.html
which can safely interpolate and quote user provided inputs. Slightly less safe (easier to misuse) are all the available string manipulation functions from the quote_ family and format (https://www.postgresql.org/docs/11/functions-string.html).
Additionally most good database drivers handle proper quoting themselves (like psycopg2 for python) - there are a lot of gotchas but in all cases that has been a solved problem.
Yes, I used prepared statements or some quoting framework support.
Not talking about quoting on the SQL level, that’s clear but inside of a query string.