“Despite this vast body of theoretical knowledge, few of the parsers that are in production systems today are textbook cases of the theory. Many opt for hand-written parsers that are not based on any formalism at all.”
Ah, to say that a parser is hand written is not to say that it is not based on any formalism. In fact, recursive descent parsers can be derived by taking a context-free language in Chomsky Normal Form, transforming that to Greibach Normal Form and basing on recursion that uses each left-production to determine which further functions to call. I don’t think anyone can produce a recursive descent parser of any length without a fairly intimate understanding of language transforms - not a bad thing imo.
I think generating a parse by hand is useful since it gives one a pretty exact idea what’s happening. Using Yacc-code always seemed klunky and less-clear to me. I’ll admit I’ve only tried to read Yacc code rather than creating it but I still think that’s part of the point - recursive descent parsers are fairly easy to create and easy to understand if documented reasonably well. And it is quite possible to add hooks for features that make the language not-context free but which happen to be desirable as well as adding intermediate function to make the code easier to understand. But all that, again, is a good imo.
And it’s ironic for someone to cite Ruby as an example of doing good by using Bison - the original Ruby Bison code was a complete incomprehensible cluster-fuck that took years to sort out (if it has been sorted out, I stopped paying attention). That language was specified by Bison did not change the situation that Ruby had no standard, that “MRI” (Matz Ruby Implementation) was the only real gold standard of Ruby behavior for a long time (I’m hoping it’s not anymore).
Yeah, I didn’t find the “Theory vs. Practice” section the strongest part of the article. But the rest of the article (starting with the section after that one) is a quite good overview of the ambiguity problem in CFGs, imo, and has a good discussion of why various proposed approaches to let you sidestep ambiguity (PEGs, GLR parsers, etc.) don’t really solve the problem.
The post linked here, LL and LR Parsing Demystified, is one of the better treatments of the subject I’ve seen.