1. 14

  2. 3

    This is a nice blog post and reinforces the feeling I have that a properly configured Postgres can handle most of the search and lookup problems people have, especially if the important indexes can fit in memory.

    I’ve been looking into the options for full text search lately, especially for handling both alphabetical text and asian text. So far I’ve found:

    • Elastic Search (easy)
    • Solr (kinda easy)
    • Vanilla Lucene
    • CLucene (probably dead)
    • Sphinx (Stagnant, asian text is difficult)
    • Manticore Search (Sphinx fork, Active, asian text is still difficult)
    • Trinity (C++ multiple codecs are supported, pretty young)
    • Tantivy (Rust, inspired by Lucene, not as full featured yet)
    • Bleve (Go based text search indexing, young)
    • Riot (Go based full text search engine, asian languages supported, young)
    • Postgres (Easiest if Postgres is already being used.) (zhparser extension is possible)
    • Groonga (Usable in Postgres with Pgroonga or Mysql with Mgroonga)
    • Mysql (supports fts of asian languages with mecab tokenizer)

    Of these, ElasticSearch (Lucene, really) is the most powerful out of the box and with plugins supports asian languages with text analysis. Postgres is most easily runnable on a single box with everything else from a site and with pgroonga can handle asian languages deftly, especially Postgres 10 with the ability to search jsonb. For what it’s worth, I’ve been using postgres because my search requirements will remain relatively small and I’m having trouble imagining that replication can’t solve whatever scaling issues I’d have.

    What I’m wondering is has anyone used any of these in production besides ElasticSearch and come up with benchmarks for various queries? Relevancy is difficult to benchmark but comparisons can be drawn. I find this especially important for minimizing the costs associated with running a full featured service without being bankrolled.

    1. 2

      I’ve also had good results from picky for search in ruby apps; fast and good relevance when doing the obvious thing (vs bleve where the obvious thing gave me bad relevance slowly - I’m sure I was doing it wrong but perf on naive usage is one metric to evaluate.