1. 18

Sequel to https://lobste.rs/s/vs1jkv/simple_powerful_pratt_parsing

  1.  

  2. 3

    And there’s also precedence climbing, which is also a nice algorithm and (I think) equivalent to both Pratt parsing and shunting yard. Personally, I’ve found the shunting yard algorithm conceptually easier to extend to prefix and postfix operators. See https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing

    Nice blog, btw :)

    1. 3

      Yes see Pratt Parsing and Precedence Climbing Are the Same Algorithm.

      The CS professor who maintains the pages on precedence climbing agreed with that:

      https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm

      I think it’s possible that Shunting Yard is the same as the other two, but like I said I don’t think this post actually forms an argument that this is the case. It refactors code and shows that they print the same thing (sort of), but that doesn’t show the algorithms the same. e.g. They could have a different running time or memory usage and still print the same thing (they don’t in this case).

      But like I said it’s mostly an academic argument, since I can’t really think of any situation where you would want to use one algorithm but not the other. The only reason would be you don’t want to blow the stack with recursion, which doesn’t require any special handling with Shunting Yard.

    2. 1

      I would have liked to see a summary of the argument up front. The Rust code didn’t illuminate much for me (because I don’t know Rust :) )

      I see this at the end:

      Placing print statements at all points where we construct an S node prints expression in a reverse polish notation, proving that the recursive algorithm does the same steps and in the same order as the shunting yard.

      So are you saying that both algorithms will print out a RPN version of the expression so they’re equivalent?

      I’m not sure that follows. This comment says what I think, mainly because the author started with the conception that Pratt Parsing and Shunting Yard are equivalent, but after implementing Shunting Yard he came away with the impression that they were different:

      https://www.reddit.com/r/rust/comments/g0eusf/blog_post_simple_but_powerful_pratt_parsing/fnaz4g6/

      That resulted in this blog post: http://www.oilshell.org/blog/2017/04/22.html

      My comment from last year that was in the /r/rust thread:

      https://news.ycombinator.com/item?id=19201396

      Also, did you explain the shunting yard algorithm anywhere?


      What I will say that it basically doesn’t matter which one you use. If they are not identical, they are extremely similar, so it’s somewhat of an academic point. (The only thing I would have like to see is how to handle ternary and a[i] in Shunting Yard, which wasn’t in the post.)

      I would say to use whichever style is easier for you to understand and compose with your non-expression parsers. They are very close in any case.

      As I pointed out in the end of that post, the original AT&T C compilers composed recursive descent and shunting yard (they actually mention Dijikstra in the source code). I found it easier at first to compose recursive descent and pratt parsing (since it’s a recursive algorithm), but these days I think it’s a toss up.

      1. 1

        I would have liked to see a summary of the argument up front.

        Good call, added one. Really, I would have loved to make an interesting post out of this one, but it 100% consists of boring Fowler-sized refactoring, so I decided to just throw walls of code at the readers.

        So are you saying that both algorithms will print out a RPN version of the expression so they’re equivalent?

        Not exactly, the newly added plan of attack at the start of the post does it better. I literally just refactor the code of TDOP to the code of Shunting Yard, taking one small control-flow preserving step at a time. The first steps in particular change the verbose, but obvious implementation from the previous post into the more canonical (but also, harder to understand for me) one, where there’s only one place with “dispatch” based on token priority.

        The bit about reverse polish notation is to double-check that we haven’t changed the control flow accidentally during these refactorings, and that indeed even the original recursive formulation does exactly the same steps as the iterative one.

        1. 1

          OK that makes sense. The update helps a lot.

          I might have 1% doubt based on the ? : issue that Bourguet brought up, and that apparently doesn’t appear here. Maybe the next time I write an expression parser I will try shunting yard and see :)

          Although if you are sure that they’re identical you can e-mail Norvell as well. On this page he still lists Shunting Yard as a separate algorithm, though it was updated in 2016 to say pratt parsing and precdence climbing are the same:

          https://www.engr.mun.ca/%7Etheo/Misc/exp_parsing.htm#shunting_yard

          And it says:

          Usually the shunting yard algorithm is presented without the use of recursion. This may be more efficient and might aid in generating better error messages, but I find the code a bit harder to understand.

          As far as I can tell he is saying that the shunting yard algorithm can be implemented with recursion and (implicitly) this is different from precedence climbing…

          But I don’t think it matters too much in practice like I said. They are very similar and you can use them almost interchangeably it seems, except maybe for more complicated expressions.


          FWIW I decided not to use Pratt parsing for the rest of Oil. Oil has Python-like expressions like:

          a[1: -1]
          [1,2,3]
          {k: v, [k2]: v2}
          3 if f() else 2
          

          and faced with implementing that, it felt like Python’s grammar generator pgen2 was more ergonomic than Pratt Parsing, which I had already implemented. (It has some downsides too though – parsing is full of tradeoffs.) I also looked at the tinypy parser, which was done entirely with pratt parsing, and I didn’t quite like it.