1. 21
  1. 4

    How does Stork compare to tinysearch, which was posted here recently? https://endler.dev/2019/tinysearch/

    1. 3

      Tinysearch is great! I was chatting with Matthias briefly after I realized we had been working on something similar. He put a lot more work into the data structure, doing some really cool work implementing a bloom filter to get the index size down. I focused on ease of integration, making it a fully hosted library. Eventually I want to borrow some of Matthias’ ideas for Stork!

    2. 2

      I would be interested to see a comparison to the performance of other in-browser search index schemes, like lunr.js, flexsearch, fuse.js

      (@jil you might want to check your homepage design in WebKit browsers - on my iPhone 10 a lot of text was off the edge of the screen with no way to fix it!)

      1. 1

        Coming in kinda late, but just fixed the text on the site. Thanks again for alerting me.

        1. 1

          Same on Firefox Preview (Android)

          1. 1

            Oh no! Thanks for letting me know. I had fixed a layout bug everywhere else… and must have accidentally borked things on mobile.

            CSS is frustrating sometimes.

          2. 1

            Stork seems large. It’s like 180k and from my reading of the code doesn’t do any sort of stemming it just looks up words in a hashmap and finds the associated documents.

            I’m a bit of a skeptic about the viability of rust+wasm on the web.

            1. 3

              The index of the federalist papers is a further 1.8mb. There’s 70k of ancillary script, too. Add them all together and you’ve got a transfer size comparable with the header image on a medium blog.

              That’s going to add 2 seconds to your first-load-time on a world-median 6mbps connection (highly cacheable resources, so no network transfer on page 2). In real terms that could be a very worthwhile tradeoff for some workloads.

              1. 1

                That said it’s kind of amazing that this works at all!

              2. 1

                I’ve been lately thinking about adding search to some documentation servers we have at work, which are essentially serving static content produced from a bunch of asciidocs.

                Both stork and tinysearch could be useful for that task, as we could hack somehow which plaintext goes into which page. However, I have a feeling it would introduce a big overhead with pushing unnecessarily (for our use case) all the indexes to the clients, where we actually control the web server, and could run a search process there. Are there any tools for the use case where one has a static site, but also controls the webserver? I would like to avoid elasticsearch, but I’m not aware of any other.

                1. 1

                  This is probably the best time to send indexes to the client! Your business probably provides decent machines to developers/doc users, as well as a good uplink to wherever the docs are hosted!

                  For an example of a client-side search system that works well, check out mkdocs. Mkdocs is a static site generator written in python that includes lunr.js client-side indexed search; see the docs here: https://www.mkdocs.org/user-guide/configuration/#search

                  I built the engineering documentation system at Airbnb on mkdocs with a hybrid search model, here’s how it works: Each project generates a standard mkdocs static site. The build outputs a bunch of HTML files as well as an index.json which contains flattened, stopword-processed plain text that’s very easy to index. When a project’s docs are deployed, this build output gets pushed up to S3 for static serving. Inside a single project we used mkdocs’ build-in search feature to search that index on the client.

                  There’s also a small node.js server that enumerates all the projects and pulls all the index.json files for each project into memory and combines them into a single lunr.js index. The server handles top-level queries when the user isn’t sure what project they need to look at. Lunr.js and S3 are both fast enough that the server process can re-fetch and re-index any changed index.json content as part of servicing a search request - at least, for the ~100ish projects that changed docs ~about 10 times a day. Search quality was probably lower than you’d get with Elasticsearch, but the simplicity of the system was well worth it - I made this whole system happen in hours instead of the days or weeks it would have taken to magick up a cluster, figure out all the fiddly elasticsearch bits, etc.