Isn’t this a feature of metaprogramming in imperative languages rather than a problem? The whole point of metaprogramming is to change the programming language’s features or whatever during compile time. They typically do that with a Turing-Complete language. That means such an imperative language w/ metaprogramming will have at least two steps to produce a parse tree: create initial parse tree from text that contains the macro or template commands; macro/template expansion. The result is the final, parse tree.
I’m interested in any usable languages people know of that support full metaprogramming with everything decidable at compile-time during first parse. I’m not saying it doesn’t exist but I haven’t heard of one.
The difference is in Lisp everything around the splice point is the same regardless of your macro expansion. In C++, the kind of the template expansion effects the parse of code around it. It’s sort of like if a Lisp macro could insert a floating open-parenthesis.
In most metaprogramming systems that aren’t just text macros, an expansion like “1 +” that would screw with operator precedence isn’t valid. That’s basically what the templates example here is though.
The problem here is that the parsing itself is undecidable, not just the evaluation of the parsed code.
This is only a “problem” because it interacts poorly with other “problems”. In the case of C++, the other, bigger problem, is that a confluence of (mis-)features make it difficult to perform separate compilation. Meanwhile, whether a Lisp program will parse (more specifically: macro-expand) is trivially undecidable, but Lisp compilers fair just fine.
This problem is specific to C++, due to its syntax. In fact, ironically, this happens due to a syntax that’s decidable in C, where the syntax comes from, but add on templates and it’s no longer decidable.
So, LISP with macros is decidable at the parser level but C++ isnt? Or what? Im trying to understand why it was OK to not know what a program does ahead of time with LISP macros but isnt with C++. May be something Im missing.
The following factors hurt the usability of C++ templates relative to Lisp macros:
In Lisp, ordinary macro expansion happens well after read-time. In C++, template expansion is part of a single combined syntactic-semantic analysis process that is impossible to fully decouple.
Lisp macro expansion produces Lisp syntax trees (that is, S-expressions). C++ template expansion produces who knows what intermediate representation. You can’t manipulate it yourself.
Lisp doesn’t require implementations to perform much static analysis. Finding errors in your Lisp program is something you do by running it. On the other hand, in C++ most syntax trees actually aren’t valid programs, and C++ programmers expect their compilers to help them find semantic errors in otherwise syntactically valid programs. Template expansion complicates the process.
Thanks for detailed response. To the rest of you, too, but replying to this one. From what I’m reading, the real gripe isn’t that the overall process of making a parse tree is undecidable. That happens with other meta-languages. The gripe, which starts with initial pass being undecidable, is about the result that has for compiler developers meaning (a) it’s awful to deal with and (b) delays all kinds of analysis/transforms they could be doing until they’ve built the whole tree. I figure (a) is always going to be inherently slow esp with cache pressure with (b) negatively impacting parallelism at some point. On top of a mess of code involved if one wants error messages.
Am I correctly understanding what they’re griping about?
I can’t speak for others, but what bothers me the most about C++ templates is that template expansion is intertwined with everything else. That either macro expansion or type checking is Turing-complete isn’t a big deal by itself. For example, Rust has both a Turing-complete macro system and a Turing-complete type system, but fixing a type error somewhere can’t possibly introduce a syntax error somewhere else.
In my understanding, yes. The issue is this:
a * b;
Is this a type declaration, or multiplying two integers? You can’t know without knowing the type of a and b. Depending on what a and b are, this can require expanding templates. Which is equivalent to a turing machine.
This only happens because of this specific syntax. It’s not just “templates expansion is equivalent to a turning machine”.
Lisp’s macros are easy to parse: that’s (one of) the point(s) of lisp. You’re only gonna end up with one parse tree.
Again, in my understanding.
Lisp also allows replacing the reader. It can run arbitrary code just to parse the rest of the stream.
Then yeah, maybe Lisps have this issue as well. But not every language does, even ones with advanced metaprogramming.
(I thought these macros ran after parse time, but I am very willing to admit that’s incorrect.)
I would accept that truly arbitrary metaprogramming (even in non-imperative languages) necessarily means undecideable parsing. I think asking for “full metaprogramming” is putting the cart before the horse though; to my mind a well-designed language is one that supports the use cases that lead people to reach for macros in other languages, without needing macros.
It’s been about a decade since I’ve done any C++, so pardon my ignorance.
In fact, simply producing a parse tree for a C++ program is undecidable, because producing a parse tree can require arbitrary template instantiation.
Is the template processor’s input not a parse tree? I get that later compiler stages need to have all the template stuff processed/realized/whatever, but it seems like the given example can still be parsed.
I think the author here is, reasonably, grouping template expansion in with parse tree generation, since you need to expand templates in order to do anything useful to the tree (further compilation, typechecking, type aware refactoring, autocomplete dictionary population, etc.).
The whole point of the article is it can’t. I’m not sure what to say beyond repeating the article: it’s literally impossible to construct the parse tree for main() without resolving the (turing-complete) template instantiation, because [...]name * x; might be a multiplication or a pointer declaration.
[...]name * x;
it’s unstated in the article, and I feel this is where non-C++ people might get tripped up, that templates can add whatever the code, new types, new identifiers, etc.
there is also the casual speech usage of “parse” of parse as “process the text of”, which also adds confusion.
So really, it’s that in C++ jargon, “parsing” includes steps that aren’t decidable?
I wouldn’t really say that was specifically “C++ jargon”, because they’re using these words in a way I think any compiler writer would understand. It’s just the facts of compilation – in order to arrive at a parse tree you need to have resolved template expansion. The parse tree is the end product that moves on to other compilation stages. Until that’s done, you don’t have a full parse tree, you have lexed code and a work-in-progress tree(s) which may or may not be particularly tree-like until they reach the end.
In fact, AFAIK for many years template expansion in MSVC didn’t even consume an intermediate parse – it worked as a complicated set of replay transformations of the lexed input stream (this was IIRC the source of their long non-conforming two-phase name lookup issues). So you certainly have no guarantee that template expansion consumes all or mostly parsed code, because too many questions are undecidable without knowledge of the template’s definition – including whether the lexed token “*” represents multiplication or a pointer declaration in a given context.
Consider the article’s example, a line like
We know that running that template is undecidable, so in order to parse the line we need to build a parse tree without instantiating that template. The article’s point is, we can’t do that, because that’s not necessarily a variable definition. Undecidable<> could evaluate to float, and
is obviously a definition. But it could also evaluate to 1, and
is a multiplication expression. So, we can’t know the parse of a given line of C++ without actually instantiating its templates, which is undecidable, therefore parsing C++ is undecidable.
Umm, actually in standard C++ name in S<TuringMachine<...>::output>::name * x; will be parsed as a value before the specialization of S is decided. If TuringMachine<...> turns out to be SomeType you’ll get a compiler error saying that name was expected to be a value, but it turned out to be a type. If you know that it’s going to be a type then you should write typename S<TuringMachine<...>::output>::name * x;.
S<TuringMachine<...>::output>::name * x;
typename S<TuringMachine<...>::output>::name * x;
There may or may not be cases where the author’s claim holds, but this example is clearly invalid in standard C++, I wish the author tried to compile his code before he wrote an article about it.
I linked to this article at the end of my post on parsing bash being undecidable . My reading of it is that parsing is sandwiched between the two Turing-complete execution mechanisms of C++ – compile time and runtime – and it depends on arbitrary execution state of the first one, so it’s undecidable.
I also linked to the article about Perl, and a similar issue with parsing GNU Make.
I came at this from a fairly practical angle: I want to produce as many parse errors for shell as possible.
But this is the usual confusion about what the Turing result means. A compiler that can’t decide whether a definition that exceeds memory allocation limits is correct is a broken compiler. In computer programming, as opposed to in turing machine programming, we operate in a limited space and there is no interesting difference between a program that can’t run because of resource constraints and one that might complete given a unbounded time and memory.
Which is exactly what the article states in the end:
In practice, compilers limit template instantiation depth, so this is more of a theoretical problem than a practical one. But it is still a deep and significant result if you are ever planning on writing a C++ parser.
The whole point of the article is that this has to be considered at parse time and not any later.
But what exactly makes it a “deep and significant result” for someone planning on writing a C++ parser given that template instantiation depth limits solve the problem without any reference to Turing decidability?