1. 26
  1. 17

    I still use the not so well known look(1) utility to check the HaveIBeenPwned files. It works like grep, but only searches for prefixes, and since it assumes the input to be sorted (it does a binary search), it’s a LOT faster than grep.

    NAME
        look – display lines beginning with a given string
    
    SYNOPSIS
         look [-df] [-t termchar] string [file ...]
    
    DESCRIPTION
        The look utility displays any lines in file which contain string as a prefix.  As look performs a binary
         search, the lines in file must be sorted.
    
    1. 3

      What about shoving it into SQLite and adding an index?

      1. 3

        That’s what I did, as I went for the simplest possible solution I had in my toolbox. It totally worked, but it did increase the size on disk quite a bit.

        The look(1) utility is a better answer though.

        1. 1

          That would cost quite a lot more GBs at the very least.

          1. 1

            I’d be curious for someone to try and report timings, but I think that solution has a lot less locality (pretty much anything does, relative to a flat file), so I’d guess it’s significantly slower

            sqlite is relatively efficient but it can do so much more than this type of query. It would be surprising if it was anywhere close to as fast

            I also how wonder a column store (Duck DB?) would perform on this. I think they are more highly optimized for this type of simple query

          2. 3

            How about replacing the binary search with interpolation search? https://en.wikipedia.org/wiki/Interpolation_search It’s the generalized form of your position heuristic.

            1. 1

              A disk-block-aware approach (request 4096 bytes at a time and scan the entire block) would probably help a bit as well.