I’ve used Aho-Corasick a lot. Not just on its own, but also as a tool to optimize special cases in a regex engine. Here are some various tidbits:
Inspector Lestrade|Sherlock Holmes|Irene Adler
I found it a little hard to understand the aims and tradeoffs of this algorithm from wikipedia. The original paper is here: https://cr.yp.to/bib/1975/aho.pdf
This algorithm lets you search for all occurrences of many short patterns in a long text.
One application might be syntax hilighting keywords in a text editor.
It works by building a finite state machine out of a list of patterns. This is similar to the common lexing technique we use of making a vector of regular expressions and running them all “in parallel”.
The historic articles on string matching and dynamic programming are a real treasure. They covered ground-breaking algorithms in just a few pages, didn’t hold anything back, and the work could be replicated without needing to refer to earlier articles. The pseudo-code is so clean and efficient that when re-implementing these algorithms from the original paper, it is possible (IMO) to get back into the exact thought patterns of the authors. Almost like they preserved a part of themselves in pseudo-code.
(Flagged) Posting wikipedia articles doesn’t seem appropriate for lobsters.
My gut instinct as someone that has lurked here for a while but am only new to commenting recently is that one of the worst parts of proggit is people posting plain links to wikipedia articles or software libraries with no context.
But I’m glad this was posted because burntsushi’s comment is highly valuable.