I’ve always thought that the reason that you just don’t do search on your database is that the performance hit is terrible. Not only because of the search being slow, but because it impacts the rest of the system. In that context, platitudes like “The full-text search feature included in Posgres is awesome and quite fast (enough).” are not very useful. It seems no surprise to me that postgres has some sweet search features (and probably has had them for a while), but the performance thing is generally a show-stopper for anything in production.
Does anyone know of some benchmarking or quantitative performance data with a real production system using postgres for search?
No intentional bench marks - just personal experience.. Using tsvector and trigrams I was able to reduce query times from ~18 seconds to ~ .5 seconds for fairly complex queries of multi-million row databases.
Another nifty feature is the highlighting / headline stuff!
Thanks! This is a useful data point. I think I’ll try to move all my search needs into postgres next time i’m working on that.
My understanding is that the indexing postgres does for FTS is relatively quick and not a major perf hit in production. The problem I’ve had with using it over, eg. ES is more to do with features – in particular it has to do with things like Faceting, custom ranking constructions, etc.
It may be also that – since FTS does get used more often than other components, it could be detrimental to have it running on the same hardware as the rest of your system, though that strikes me as solvable-by-wallet rather than necessarily solvable-by-developer. That distinction might be appealing to some folks who don’t like having a preponderance of tools. Personally, I don’t mind it so much.
Hmm. I’m still not clear on how the index is made. I’ve been using sqlite full text search for a long time with great results. It’s also very easy to weld into a project without tons of custom SQL; it’s just a LIKE query but using the MATCH keyword.
This is kind of disappointing as I’d always planned on converting my sqlite projects to also support postgres, at least for practice if not for production, but the changes look more extensive than I’d like to dig into.
In all their (postgresql’s) examples, they coalesce multiple columns into a single row to be indexed. Say you want to do general searches on books, you would coalesce title, description, author.. etc. Then create the index on that coalesced column:
CREATE INDEX all_book_info_idx ON books USING gin(to_tsvector('english', combined_fields_column));
Then querying would become something like:
SELECT isbin from books WHERE to_tsvector(combined_fields_column) @@ to_tsquery('John Doe');
The syntax is a bit different from MATCH for sure.. so that’s a bummer :D
My bad, the to_tsquery bit is incorrect - should be : @@ plainto_tsquery('John Doe');
@@ plainto_tsquery('John Doe');
While it might be ok for many cases, there are some caveats to keep in mind. One is the fact that it doesn’t have that great support for compound words which might be an issue for you if you have to support searching in languages that use them. For example, out of the box, you won’t find the “Wurst” in “Bratwurst”.
There is support for that using ispell dictionaries, but the ispell method is a bit clunky and by this point, most of the German dictionaries are maintained in hunspell format which also has a way better method for dealing with compounds, which is not supported by postgres at this time.
Finally, if you need to use a dictionary for your tsearch config (which you absolutely need if you want to deal with compounds), keep in mind that the dictionary is loaded per connection the first time you use the tsearch facility. Loading a ~20 Megs dictionary takes about 0.5 seconds on my huge-ass box.
As such, you might have to change your application to use persistent connections which can have other side effects you need to consider.
Postgres might still very we’ll be good enough for your purposes (it is for me), but just keep these caveats in mind and plan accordingly.