There’s a very nice youtube video about Parser Contaminators from D. Hutton that is a professor on the subject: https://www.youtube.com/watch?v=dDtZLm7HIJs
It explains very well the concepts with a working implementation in Haskell.
A parser for things
Is a function from strings
To lists of pairs
Of things and strings
I really wish that people would stop recommending PEGs; in my experience, the cases where they do what you expect is exactly the set of cases when the underlying grammar is LL(1); if you have anything more complex, then the ordered choice construct will silently hide what the grammar author would want to be reported as an ambiguity and you’ll therefore end up parsing the wrong thing. Fixing this problem generally requires all sorts of hacks such as completeness or negative lookahead combinators (the latter of which are especially insidious as they need to be deep in the parser to avoid accepting too much).
This is not to say that the idea of parser combinators is bad; in fact, they have a wonderful API that has done far more to get people to use real parsers for small tasks. The problem is that there is a simple, intuitive implementation of parser combinators (i.e., PEGs) that will tend to behave in spectacularly unintuitive ways. It is possible (and in fact, relatively easy) to provide a parser combinator API for a LL(1), LALR, ALL*, or really any other parsing algorithm; see, for example hammer for how it might be done.
(Full disclosure: I wrote a significant portion of Hammer)
Isn’t the parser combinator framework that this article talks about something distinct from a PEG? Am I misunderstanding what PEG refers to?
The class of grammar that this framework can parse is exactly PEG; this is easy to tell from the fact that the orElse combinator implements ordered choice. If the first match succeeds, the choice will never be re-examined even if it causes the grammar as a whole to fail.
So, for example (with apologies for the formatting, as I don’t speak F#),
(many (pchar 'a'))
(many (pchar 'b')))
will fail on “bc”, because the first branch of the orElse succeeds having parsed nothing and then the parser for ‘c’ fails.
Somebody who doesn’t know much about parsing would tend to imagine that that would be equivalent to the following:
(sequence (many (pchar 'a')) (pchar 'c'))
(sequence (many (pchar 'b')) (pchar 'c'))
(i.e., distributing the sequence into the orElse)
However, this latter parser will accept “bc”
Any reason your changes aren’t upstreamed?
UpstandingHackers is actually the primary repo; the reason it’s marked as a fork on Github is that my ex-wife didn’t want to deal with making it look right :-/
I tried writing a very minimal introduction to parser combinators for Python folks here. It is skeletal at the moment (mostly code, and the text needs to be filled in), but could be more approachable for people from the Python land.
Note: The parser described in my link is context-free. Not PEG.