Lots of good stuff in there, but my favorite was learning about how ripgrep ensures it uses the fast-path memchr() when searching.
Instead of searching for last character in the string, which may be quite common character (in fact, statistically is more likely to be a common character!) and therefore keep switching out of the fast path, it has a background distribution of characters. So like, “the letter E is very likely to be appear in source code, the letter Q much less so”. And then it searches with memchr() for the least likely character.
So if you search for “quite”, it searches for the “q”, not the “e”, it can stay inside memchr() for much longer, which means it’s much faster.
To clarify, the performance wasn’t from using an optimized memchr(). Glibc already has an optimized memchr() (though MUSL doesn’t). The performance was from choice of character to search for.
Lots of good stuff in there, but my favorite was learning about how ripgrep ensures it uses the fast-path memchr() when searching.
Instead of searching for last character in the string, which may be quite common character (in fact, statistically is more likely to be a common character!) and therefore keep switching out of the fast path, it has a background distribution of characters. So like, “the letter E is very likely to be appear in source code, the letter Q much less so”. And then it searches with memchr() for the least likely character.
So if you search for “quite”, it searches for the “q”, not the “e”, it can stay inside memchr() for much longer, which means it’s much faster.
And this optimized
memchr
is a crate. So is the parallelized dir walking. You can easily reuse lots ofripgrep
’s best components.To clarify, the performance wasn’t from using an optimized
memchr()
. Glibc already has an optimizedmemchr()
(though MUSL doesn’t). The performance was from choice of character to search for.Obviously having a fast
memchr()
is also helpful.