1. 28
  1.  

  2. 3

    I’m working on something very similar to this. The main problem I want to solve is splitting the index into chunks, and doing so dynamically, if it’s needed when something gets added to the index.

    1. 4

      One thing I toyed around with was creating different index files based on part of the hash of the key. This way it’s deterministic (e.g. you might not need to update all indexes if you’re adding one word). You can also alter the “split factor” if you need smaller indexes.

      e.g. in Python:

      import hashlib
      indexes = {}
      
      key = 'cat'.encode('utf-8')
      key_hash = hashlib.sha256(key).hexdigest()
      
      split_factor = 3
      # use this to choose the index
      
      print(key_hash[:split_factor]) # "77a"
      
      1. 2

        I went with splitting the alphabet space, for example 00000-zzzzz space can be split into two buckets, something like 00000-ddddz and ddddz-zzzzz (or whatever the boundaries are). I have zero proof for this but I thought it might help when the user is making related queries.

        Edit: I’m planning to allow chunks to get chunked again, that solves the imbalance you might not have with the hash based approach

    2. 2

      I’ve used Stork for my blog, but ended up not keeping it enabled. I have a pretty decent phone, and an okay Internet connection, but there was a fairly long (2-3 second?) interval where my phone locked up while Stork was loading its database. I may try again with a dedicated search page (I wanted search to appear on the main page of my site, so the delay was unacceptable), but for now, I’m going without search.

      1. 1

        I’d really love since expansion on this paragraph, please?

        Wasm runs faster, loads quicker, and ships smaller binaries (in theory, in practice it takes some work to get there).

        The impression that I got from was that you tended to get fast running but pretty big bundles when targeting wasm. Possibly this impression is just a holdover from emscripten. If you can get small code out of wasm then I’d love to hear more about how. <3