1. 4

TL;DR: Go to page 9 for a comparison table of parsing error messages against clang and gcc.

Author: François Pottier, 2016.

Given an LR(1) automaton, what are the states in which an error can be detected? For each such “error state”, what is a minimal input sentence that causes an error in this state? We propose an algorithm that answers these questions. This allows building a collection of pairs of an erroneous input sentence and a (handwritten) diagnostic message, ensuring that this collection covers every error state, and maintaining this property as the grammar evolves. We report on an application of this technique to the CompCert ISO C99 parser, and discuss its strengths and limitations.

This research paper describes an algorithm to provide good error messages for LR parsers generated by parser generators of the yacc family. I don’t think anyone need to care about the algorithm itself, but (1) this demonstrates that parser-generators can produce excellent messages and (2) the authors apply it to a C parser and demonstrate, in many cases, better parsing error messages than clang. (This concerns only parsing errors, not typing/template errors as often problematic with C++ compilers.)