1. 4

  2. 2

    Wait. The grammar being spread over a few imperative functions is bad? But spreading it in a dozen files and at least as many imperative (and recursive) functions each in their own classes, then imperatively registering these classes at runtime is OK? So when you look at the “core” parsing function, you have no idea what it really does because what it will do depends on the classes that implement the overloaded parse function. Sorry but I think this is a horribly confusing way to implement what amounts to precedence parsing.

    You can actually implement the core of expression parsing (precedence climbing[1]) for C using a precedence/associativity table and two functions that easily fit in a recursive descent parser.

    The first function I call simple_expr(). It reads zero or more prefix operators (first loop), followed by “the actual value” (i.e. an identifier, a constant, a string literal, a compound literal, or a subexpression in paretheses). And finally zero or more postfix operators (second loop). There’s no need for recursion here except to parse a subexpression in parentheses.

    The other function (more or less the same as his parseExpression) I call expr(p), and it first calls simple_expr to parse at least one .. doh, simple expression. Then as long as it sees tokens which are valid infix operators with precedence higher than p, it creates a tree for it, pushing the previously parsed stuff down left and recurses to parse the other operand, using the precedence of the operator it just read for p (fudged down if the oper is right-associative). This is the third and final loop.

    Oh, and doing it this way makes it easy to disambiguate between cast operators, compound literals, and parenthesized expressions, without using lookahead (although you need to know typedefs) Since they are all parsed by simple_expr(), it checks if the first token after left-paren is a type specifier; if not, the thing is a subexpression. Otherwise it parses the abstract declarator and after the closing parenthesis checks if there’s a left-brace (which indicates a compound literal) or treats it as a cast.

    [1] https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method