1. 38
  1. 7

    Maybe, if your needs are simple. If postgres had a ranking function that could operate with corpus statistics (e.g. word document frequency for something like tfidf) it would be not only good enough but take a big bite out of elasticsearch and other special purpose databases.

    1. 0

      It does have a ranking function which does just that: https://www.postgresql.org/docs/11/textsearch-controls.html#TEXTSEARCH-RANKING

      1. 1

        No, the ranking functions only look at the current document, not the whole corpus. For instance, you can’t lower a term’s relevancy if it occurs in lots of other documents as well. Relevant line from the docs:

        It is important to note that the ranking functions do not use any global information

    2. 10

      It really is. Mastodon uses ElasticSearch for search, while Pleroma uses postgres FTS. The elastic search server itself takes more resources than Pleroma + Postgres.

      1. 3

        I love this post and referred to it heavily while building a full-text search feature with Ecto (Elixir) and PostgreSQL, which I wrote up in a blog post. It was really nice to be able to implement full-text search without adding a separate data store and having to keep it synced with the primary one.

        1. 3

          I loathe extra dependencies, and attempted to use this approach twice. One time on a medium sized database (low double-digit TBs) and one on a small one (under 1TB). On the small one it was relatively successful, eventually replaced but not due to any horrible issue, just ergonomics and feature support. On the large one, it was a fairly tough to puzzle out performance problem, and we ended up going with Sphinx (which I love!) as a replacement with shockingly improved performance.

          So personally 0/2 on this approach, or possibly 0.5/2? It is a fine place to start, but if you feel you are going to outgrow it fairly quickly might not be worth the stop in the middle.

          1. 7

            It’s, as always, a choice. I do Elasticsearch and SOLR consulting for a living and frequently run into the issue of people using them too soon, too badly. For example, I just had to save an under-performing ES with 1GB of data. You have to work for breaking that. The ES docs being quite poor at the small details and the book being unmaintained since 2.0 does not help.

            So, what I like about Postgres is that the feature is there, but it’s quite okay with having a ceiling. It’s not meant to be the big competitor in the search space. It’s built for being an okay implementation where everything that’s there is reliable and fulfilling its promises. The documentation is good and it’s rather easy to get going without much operational overhead.

            In contrast to, for example, MySQL and MongoDB, that ship shoddy implementations that lead to frequent and subtle bugs.

            In my book, it’s not even the additional dependency, but people miss that every item called “data sync” in an infrastructure should be circled with a red marker, underlined three times with a bigger marker and used as a target for a paintball gun with red bullets.

            1. 1

              There is no “too soon” in “we need faceted search”.

              1. 3

                Faceting is overrated IMHO. Not that it’s not important, but I definitely don’t agree that everything needs faceted search (as a database feature).

                There’s tons of interfaces based on a limited set of filtered tables (CMSes, apps essentially caring about 5-6 data types, etc). Many of them don’t need facets or a unified result view. In these cases, “faceted search” is easily replaced by “fire off N low-cost queries” if needed. Many don’t even need high-performance search. (they need search as an interface, though!)

                Faceted search is important with an unbouded number of categories to facet by. Essentially: Shops.

                Interestingly, Elasticsearchs way of dealing with queries as essentially a program also leads to the interesting effect that many of our clients miss the point where they should stop bashing everything in one query and start using multiple.

          2. 3

            I’m plan to build a search engine on top of PostgreSQL. It will store only the tsvector of each page instead of storing the whole text in the database. I plan to index a few sites at first then move on from there. More details on Subcafe. Text extraction as seen on Newscafe is working pretty good.

            1. 2

              Thank you so much for sharing this. Actually about to start a new gig that uses Postgres and I’ve never used Postgres. I’m excited to implement this at home in an effort to learn about some of the DB’s great features.

              1. 2

                I wondered why the text linked to PG documentation at version 9.3 and it’s because the article was written in 2015.

                Stories with similar links:

                1. Postgres full-text search is Good Enough! (2015) via BenoitTigeot 4 years ago | 7 points | no comments