Nice work! But I’d encourage the author and anyone who writes parsing tutorials to also include the helpers necessary for helpful debug messages. That is, track line and column info and have a debugToken(t: Token, msg: string) helper that lets the programmer print out a nice error message on parsing or evaluation failure:
Error in source.whatever at line 10, column 5:
for x in 1 {
^ Expected identifier or array literal
Yes it’s not hard but I’d just like to raise the bar in every parser out there to include nice info like this. For example, production SQL parsers (Postgres, SQLite, etc) are notoriously not kind with these messages.
Thank you so much for the feedback! I read a lot of your compiler posts! and yes, an error message with proper token/position information is what I missed out in this article. I’m now working on the next article that covers more concepts about recursive descent parser, and will definitely make the error handling better!
Love your blog! Your articles are very consistently clearly written. I didn’t want to steal your thunder, though. Lemme know if you’d prefer to post your articles yourself.
Recursive descent is a great technique to learn, but this example recognizes a regular language, which is non-recursive (e.g. $3 or £42).
I tend to handle all the non-recursive structure with regular languages, which is much simpler and faster [1]. As a regex it’s basically:
[$£€][0-9]+
I consider this lexing, which I don’t think is very controversial. And then technique for parsing recursive structure is complementary to lexing, and depends on the language you’re recognizing. You can hand-write it with recursive descent, use a yacc-like LALR(1) parser generator , PEG, etc.
Nice work! But I’d encourage the author and anyone who writes parsing tutorials to also include the helpers necessary for helpful debug messages. That is, track line and column info and have a
debugToken(t: Token, msg: string)
helper that lets the programmer print out a nice error message on parsing or evaluation failure:Yes it’s not hard but I’d just like to raise the bar in every parser out there to include nice info like this. For example, production SQL parsers (Postgres, SQLite, etc) are notoriously not kind with these messages.
Here’s an example of this in a Rust parser I wrote for a toy Lua implementation.
Thank you so much for the feedback! I read a lot of your compiler posts! and yes, an error message with proper token/position information is what I missed out in this article. I’m now working on the next article that covers more concepts about recursive descent parser, and will definitely make the error handling better!
I’m excited to read your next one! :)
Hey, the author here, just finished it last night and was very surprised to see it posted here on lobster already. Thank you so much adamshaylor!!!
Love your blog! Your articles are very consistently clearly written. I didn’t want to steal your thunder, though. Lemme know if you’d prefer to post your articles yourself.
Thank you so much! And no worries, I actually very happy to see my articles being shared around like this!!!
Recursive descent is a great technique to learn, but this example recognizes a regular language, which is non-recursive (e.g. $3 or £42).
I tend to handle all the non-recursive structure with regular languages, which is much simpler and faster [1]. As a regex it’s basically:
I consider this lexing, which I don’t think is very controversial. And then technique for parsing recursive structure is complementary to lexing, and depends on the language you’re recognizing. You can hand-write it with recursive descent, use a yacc-like LALR(1) parser generator , PEG, etc.
More on this argument: Why Lexing and Parsing Should Be Separate
[1] https://www.oilshell.org/blog/2020/07/eggex-theory.html – some more arguments here but this post is a bit dense, and not sure most people got it