Lots of good stuff in there, but my favorite was learning about how ripgrep ensures it uses the fast-path memchr() when searching.
Instead of searching for last character in the string, which may be quite common character (in fact, statistically is more likely to be a common character!) and therefore keep switching out of the fast path, it has a background distribution of characters. So like, “the letter E is very likely to be appear in source code, the letter Q much less so”. And then it searches with memchr() for the least likely character.
So if you search for “quite”, it searches for the “q”, not the “e”, it can stay inside memchr() for much longer, which means it’s much faster.
And this optimized memchr is a crate. So is the parallelized dir walking. You can easily reuse lots of ripgrep’s best components.
To clarify, the performance wasn’t from using an optimized memchr(). Glibc already has an optimized memchr() (though MUSL doesn’t). The performance was from choice of character to search for.
Obviously having a fast memchr() is also helpful.