1. 6

Abstract: “Tree-structured text is ubiquitous in software engineering and programming tasks. However, despite its prevalence, users frequently write custom, specialized routines to query and update such text. For example, a user might wish to rapidly prototype a compiler for a domain-specific language by issuing successive transformations,or they might wish to identify all the call sites of a particular function in a project (e.g. eval in JavaScript). We propose a natural and intuitive extension to regular expressions, called TreeRegex, which can specify patterns over tree-structured text. A key insight behind the design of TreeRegex is that if we annotate a string with special markers to expose information about the string’s tree structure, then a simple extension to regular expressions can be used to describe patterns over the annotated string. We develop an algorithm for matching TreeRegex expressions against annotated texts and report on five case studies where we find that using TreeRegex simplifies various tasks related to searching and modifying treestructured texts.”


  2. 3

    There’s some code here: https://treeregexlib.github.io/

    1. 2

      I’ve skimmed over this, so forgive me, but it seems quite familiar to me—pattern matching in Lisps. It even uses similar notation for trees, parenthized expressions. Yet, the alternatives outlined make no mention, unfortunately.

      edit turns out that PLT redex is mentioned, and that does have structual pattern matching, but has a more targeted purpose. General, structual pattern matching, as defined by racket/match and others, provided a parser from C -> sexp seem essentially equivalent to this in power—that’s all.

      1. 2

        I would love this to work with plain text files directly (ie. Include a generic way to extract some structure from text files and then match expressions on the result). Overall, isn’t XML with XSLT/XPath a more robust option? Syntax aside (XSLT is arguably not very pleasant to write), the ability to write transformations and queries at the same time makes it quite powerful. Also, XSLT templates tend to survive minor changes to the structure of the tree.