1. 13
  1. 3

    Nice! Some thoughts/questions:

    Why are there tokens in Ungrammar, like ‘mod’, since they don’t appear in the AST? Are they just comments? They do make it a lot more readable.

    Naively, I would have expected Ungrammar to derive nothing more than a data definition:

    struct Module {
      attrs: Vec<Attr>,
      visibility: Option<Visibility>,
      name: Name,
      item_list: Option<ItemList>,

    Why the fancy traits instead? Does it enable some common/useful visitor-like patterns? Ah, actually I can see a good use case just from here: most nodes would implement AttrsOwner, and it would be useful to have functions that take a T: AttrsOwner as an argument.

    I see you go from Attr* to “AttrsOwner”, though haven’t found where that happens in the source at a glance. Is that by simply adding “s”, or do you use a pluralization library that can handle “es” and Knife* -> “KnivesOwner”?

    I think you should have baked the “alternation only at the top level” rule into the “spec”. Without it, Ungrammars aren’t cross-lingual. Someone will write an ungrammar with nested alternations in a language with union types, and it will be a perfectly compliant ungrammar, but you won’t be able to use it in any other language’s perfectly correct ungrammar implementation.

    The “EBNF” grammars that PL papers use can be viewed as a way to specify ASTs. Technically they specify a textual grammar, where there’s a parentheses rule to resolve ambiguity. But no one ever writes down the parentheses rule, so there’s just a kind of magic implied grouping, which makes them act like a grammar on trees, rather than a grammar on text. And they’re certainly treated as trees in derivation rules.

    1. 5

      The tokens are there because ungrammar describes concrete syntax tree. Having all punctuation there is essential.

      The traits are there just for code re-use, they could have been inherent methods instead. The underlying syntax tree doesn’t use structs though, so specifying things as fields is impossible.

      Pluralization and case convention are done manually: this is not user-facing code, so fast compile times are more important than generality and correctness.

      The ungrammar deliberately doesn’t restrict possible rules, I don‘t think „lowest common denominator“ solution is right for thieves case.