1. 21

  2. 18

    I’d like to share my practice based perspective (I write tools that analyze incomplete Rust code for a living).

    In my experience, the only bit where you need to specifically care about incompleteness is parsing and design of syntax trees. Higher-level analysis phases usually “just work with incomplete code, unless you specifically bail on the first error. When you lower incomplete concrete syntax tree to an AST, you turn missing expressions to a literal Expr::Missing (basically a hole). When you typecheck, everything that does not resolve or is contradictory gets the type of Ty::Unknown (a hole as well).

    Specifically, various assists and local refactorings don’t have specific code paths for error, and they mostly just work for incomplete code. This seem to contradict the following statement in the paper:

    when attempting to integrate these features into an editor, it is difficult to reason about malformed or meaningless edit states. Many of these systems therefore fall back onto tokenized representations of programs

    The only place where you really need to be somewhat specific about missing/erroneous input is parsing. And here I feel like there exists a huge rift between what is studied as “parser error recovery” in academic literature, and what I’ve actually seen and used in practice in IntelliJ.

    In academia, the problem of parser error recovery is usually formulated as “find a minimal edit (insertion/deletion of tokens) which modifies input to be syntactically”. With this definition in mind, various algorithms are presented for reconstruction of corrected edits.

    The problem with this is that interactive coding environments do not care about corrective edits at all. It’s the question that is not helpful, not the answer. In particular, tools I worked with do not insert tokens, which does not match the following statement in the paper:

    use error recovery heuristics that silently insert tokens so that the editor-internal representation is well-formed

    Instead of finding minimal correcting edits, IDEs I’ve worked with do two things:

    • Embrace not well-formed syntax trees. In a concrete syntax tree, everything is optional (even if the grammar says that it is mandatory). If a cst is lowered to an ast, missing bits are replaced with explicit holes, but the parser just produces the tree as is (example). Specifically, if parser matched a prefix of a grammar rule, and then hit an unexpected token, it just produces the tree fragment for the prefix.
    • Write a parser in such a way that it resliently tries to find the fragments of syntax trees in an input. This typically means, for recursive descent parser, that, if you parse a production A, and see an unexpected token which is in FOLLOW(A), you stop parsing A and instead parse the production which contributed this token to FOLLOW. This helps in situations like
    fn main() {}

    where we don’t want an incomplete use statement to interfere with parsing of subsequentt function declaration. This mode of recovery is relatively easy to do in a hand-written parser. It’s mostly just “if token is unexpected and is in FOLLOW, return from the current function”. What would be nifty is having a parser generator for LL/LR parsers that can do such kind of recovery, but I am not aware of one. IntelliJ’s GrammarKit can do this, but it basically generates a backtracking recursive descent. TreeSitter also has some nifty error recovery, but it uses a big hammer of GLR.

    EDIT: inevitably, this turned into poorly edited half blog-post :( I would have written something shorter and more coherent if I’ve allocated more time for this.

    1. 4

      Thanks for this very thorough comment!

    2. 6

      Cyrus Omar gave a presentation about Hazel at my school, and it was really cool! There already exists a working prototype, and the really cool part is that the team has formalized and proven theorems about the Hazel editor / language (maybe even in a formal verification system, but that I’m not sure about). It’s a really cool project!

      1. 4

        He also gave a talk on Hazel at Strange Loop in 2018, which was really great, and recorded!