1. 12

  2. 2

    The problem I found with parser combinator libraries is that left recursion seems to be essentially impossible to tackle, thus rendering them useless for simple things as arithmetic expressions. I’ve also heard that Packrat fails to intelligently parse A ::= a A a | \epsilon: presumably, only strings of length a power of two are accepted. All in all, after hearing about PEG being the one true way to solve real life parsing, I’ve been pretty disappointed… Open to refutation though!

    1. 4

      I think that you can do good engineering work with PEG parsers, even if they don’t handle certain difficult cases. Your job as a programmer is to understand the limitations of the tools available and e.g. rewrite a grammar with a A a in it to something the parser system can handle. I don’t think the problem of taking a completely arbitrary CFG and generating an efficient parser for it has been solved. https://tratt.net/laurie/blog/entries/parsing_the_solved_problem_that_isnt.html

      One of the positives about PEG parsers for parsing things like programming languages is the left biasing is useful to express common PL syntax idioms, where a more theoretically pure description could have ambiguity in it. (relevant terms: ordered choice/soft cut)

      If I was using rust I would want to take advantage of nom for the zero copy properties.

      1. 2

        Interesting link, thanks! The author refers to “boolean grammars”; I wonder if these got any traction, either in research or applications.

      2. 4

        If you are not too much worried about the performance impact, you can limit the recursion to num(nonterminals) * (1 + |input_length|), which lets you escape the infinite recursion. (e.g)

        See “Richard A. Frost, Rahmatullah Hafiz, and Paul C. Callaghan. Modular and efficient top-down parsing for ambiguous left recursive grammars. IWPT 2007”

        1. 3

          There’s a clever way that PEG / packrat parsers can use the memoization cache to support left recursion quite simply: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

          I still think that expressing infix operators with the precedence climbing / Pratt parser algorithm is both more natural to write and more performant than via left recursion, though.

          (I maintain another PEG library in Rust that implements both of these.)

          1. 1

            I looked at your library before deciding on Pest for my attempt at Make A Lisp. It looks quite robust but I had trouble with the pseudo-Rust syntax in the parser macro. What made you decide to go that route instead of a more traditional PEG syntax?

            (Not to say Pest is “traditional”, it is fairly different than the PEG grammar described in the original paper, but it’s at least close enough to make translation easy.)

            1. 1

              Using a procedural macro makes it integrate better into the Rust ecosystem. Many of the tokens are passed through to the generated Rust code with span information intact, so error messages point to the code you wrote rather than a generated file. Even works live in an editor with rust-analyzer! It does mean the syntax has to be compatible with the Rust lexer, though.

              The biggest departure in syntax is the use of Rust’s patterns (['a'..='z']) for matching a character vs the regex-like syntax in most other PEG libraries ([a-z]). It’s more verbose, but this is to enable operation over data types other than strings – for example you can use an external lexer and match tokens like [Identifier(n)].

              Other than that, it’s heavily inspired by PEG.js and some of the operators like $ come from there.

          2. 1

            As far as I can tell, most every left recursive expression can be reformed into a non-left recursive form: term <- term "+" num / num can be rewritten as term <- num ("+" num)*, in the simplest example. This isn’t an ideal solution as the resulting parse tree is not as “natural” as the left recursive parse tree would be, but it is possible.

            For your “only even forms” objection, Pest has added specific operators for “range” matching: A = @{ "a" ~ ("a"{2})* } matches 1 “a” plus any number of pairs of “a”. (This could be accomplished more easily with @{ "a" ~ "aa"* } but that’s less fun. ;) )

            There are methods for handling left recursion directly (Python’s new PEG parser implements one such method, and the Pegged library for D implements another), but they’re more complicated than the base case.

            1. 1

              Re left recursion: see here - https://www.youtube.com/watch?v=b2AUW6psVcE Solves left recursion for BNF/EBNF/… https://github.com/Engelberg/instaparse