1. 43
  1. 2

    At first glance, I was ready to completely dismiss the article because it uses the example of syntax errors not being caught before running or compiling the code. There are automatic linters that can detect this and there are plugins for various editors to use these linters to display errors inline with the code. I haven’t had a single syntax error in my code since configuring these.

    But keep reading, there’s far more interesting stuff after that. :)

    1. 2

      syntax errors not being caught before running or compiling the code. There are automatic linters that can detect this

      I’m not sure what you mean. Syntax is checked during parsing, and a linter would still need a parser to detect syntax (and other kinds of) errors.

      1. 1

        I mean, you shouldn’t need to explicitly compile or run your code for your editor to warn you there is a syntax error. The parser doing that check shouldn’t try to recover from an error is counterproductive; it’s supposed to fail.

        1. 3

          You’re right that you shouldn’t need to manually invoke the compiler or VM in order to see syntax errors, but if your IDE is annotating syntax errors then it too is using a parser to find them. And if you want the IDE to be able to report more than one error at a time, then that parser will need some kind of error recovery mechanism.

          1. 1

            To share my anecdotal experience (I make IDEs, but it’s still anecdotal): it seems that for parser errors specifically, immediacy (red wavy underline appearing just after typing) is relatively more important than the quality of the error message.

            Error resilience is indeed crucial, but it is not equivalent to error recovery.

    2. 2

      TL;DR - When it hits a syntax error, it uses a pathfinding search to try every possible path forward, stopping at either EOF or three consecutively accepted tokens from the lexer.

      Overall it makes a lot of sense, to the point of being a bit obvious. The real question is, do programmers like the result? The article can’t answer that, and neither can I. But, my guess is that it can compete with many modern tools.

      1. 1

        It would be super exciting if this approach could give generated LR parsers better error message UX than many modest effort hand written parsers. I think those repair suggestions would be good error messages just by having the result of the repair shown in the output? Would save the user the effort of interpreting the “shift” nodes. Colour could help: put the deletion in red and the insertion in green, mimicking a diff?

        Another thing that I’ve wondered about as a heuristic before is: when I introduce an error in the middle of a moderate sized already existing file, the syntax highlighting often changes violently and suddenly. Maybe an IDE could pick a repair that returns the largest possible number of nodes to the state they were in for the last few minutes?

        1. 1

          I got this kind of a parsing interface: state → symbol → ('step state + 'error (list (symbol * state)) + 'accept cst), likewise you can get the state → list (symbol * state). (maybe actual impl is some variation that turns out to be simpler, but you get the idea)

          I guess the CPCT+ is easy to implement in this setting? You get an error, roll an input window of 250 tokens and then calculate language edit distance 3 tokens ahead, until you got few candinates and then get the best one that rolls through to the end of the token window.