1. 13
  1.  

  2. 5

    While parser combinators tend not to have a tokenizer step explicitely I find it useful to still maintain the distinction between Tokenization and AST building when using them.

    It’s much easier to write them without getting lost if you separate the types of parsing they each represent. It also forces you to focus on the primitives of your grammar separately from the way they combine to build the AST’s.

    I also find that it makes adding error handling and reporting slightly easier when using parser combinsators, an area they have historically given a lot of people trouble with.

    1. 1

      I agree with you, and you can easily write a tokenizer and an AST builder using parser combinators : http://parsy.readthedocs.io/en/latest/howto/lexing.html

      In fact, this is possible because Parsy works on any iterable : strings, lists, sets, etc. Parsy handles token lists exacly like strings.

      1. 2

        Parsy looks cool. I haven’t used it but it looks like something I could see myself using. You are right that good parser combinator libraries will work with any iterable so separating the two is usually not that much effort and the gains are well worth it.

        1. 1

          Yes. I really like Parsy too.

      2. 1

        It should also be faster to do separate tokenization if you have any backtracking.

        I was just discussing this on subreddit:

        https://www.reddit.com/r/oilshell/comments/7fjl5t/any_idea_on_the_completeness_of_shellchecks_parser/dqfxz8b/

        My argument was that you should do the easy thing with the fast algorithm (lexing in linear time with regexes), and the hard thing with he powerful but slow algorithm (backtracking, whatever flavor of CFG, etc.)

        I did the same thing with PEGs. PEGs are usually described as “scannerless”, but there’s no reason you can’t lex first and operate on tokens rather than characters.

      3. 2

        Related, “Understanding Computation” 1 has a great chapter on doing lexical analysis and building parsers purely from the perspective of finite state automata.

        1. 1

          That’s funny and cool that all the parser-combinator tutorials build a calculator.

          Here’s mine, for a js-based GLL parser-combinator: https://github.com/robey/packrattle/blob/master/docs/tutorial.md. It focuses a bit more on the practical use of them, including handling recursion, reduce, error handling, and debugging.