I’ve heard that argument, and in some really reductive, literal, sense it’s true. But, it’s not useful. When someone says “compiler”, they generally mean something that generates the lowest-level language instructions for a given execution platform (x86 assembly, JVM bytecode, V8 bytecode, etc). On the other hand, “transpiler” usually means something whose output target is probably not the lowest-level language for a platform.
Of course we can just call them both compilers (or transpilers) and just define that as something that takes one kind of code and spits out another, but you lose specificity.
It reminds me of the novel 1984, where they “simplified” the English language (for nefarious reasons, but that’s not why I’m citing it) by cutting out “redundancy”, so instead of having the words: great; excellent; awesome; amazing, you only have “good”. And if something is very good, or somehow better than just “good”, you’d say that thing is “good good”.
Trying to argue that “transpiler” is not a useful term feels bad bad.
Author here: your answer actually gets exactly to the problem here:
x86 assembly, JVM bytecode, V8 bytecode
These languages are all very different! JVM bytecode is a virtual abstraction over ISAs like x86 and is compiled to them (maybe “transpiler” lovers would argue its transpiled).
On the other hand, “transpiler” usually means something whose output target is probably not the lowest-level language for a platform.
Again, this is not a very meaningful distinction. What you consider lowest-level is not the same as what I might consider lowest level. Why engage in vagaries when we can be a bit more precise.
I guess I don’t see why the output languages being different should preclude the ability to distinguish between higher level output languages and lower level output languages, which is where I’m arguing the distinction lies.
Like I said, I do understand the argument that it’s all just a transformation from one programming language to another programming language. And I will certainly admit that the line is fuzzy between what we casually refer to as “compiling” vs. “transpiling”. Likewise, there’s ambiguity between “interpreted” languages and “compiled” languages, because some compiled languages also have REPLs or JIT optimizing runtimes and some interpreted languages have tools to do AOT compilation. But, as with the current topic, I still think there’s a practical, real-life relevant, distinction between interpreted languages and compiled languages. There are things that are usually true of interpreted languages and things that are usually true of compiled languages that make the distinction meaningful in real life even if it’s hard (for me) to come up with a technically precise definition.
So, when I, personally, make a distinction between “compile” and “transpile” or “translate”, the difference is that I usually think of the output from “compiling” as the lowest level language that we’re able to inspect before execution. It doesn’t matter if it’s JVM bytecode or x86 ASM or anything else; the point is that this is the “bottom” language representation of my program that I can actually inspect and manipulate.
As for why that distinction might matter, I’ll use a thought experiment. If I’m writing C++ code, one could imagine having two kinds of “build” operations for getting this program to run on my desktop: we can compile with GCC or Clang to x86 ASM, or in some alternate universe our compiler/transpiler can take my C++ code and spit out C code, which I’ll then need to transform again before it’s in its final state. If I need to really dig down and optimize the heck out of something in my code, which of those two scenarios is going to be best: inspecting the ASM or inspecting generated C code? I think the answer is obviously that inspecting the ASM is going to be more fruitful, because the C code is going to be processed and transformed again in possibly hard-to-predict ways.
Now, you can argue against all of this. You can argue that even inspecting the ASM leaves ambiguity because the OS and the CPU can do surprising things like combining operations, etc. And you’d be correct, but only in a patrician, philosophical, sense. In practice, any engineer will agree that you have more control the closer you are to the “bottom” of the latter of abstraction.
So, that’s my view on why distinguishing between “compiling” and “transpiling” can be useful even when the terms are a little fuzzy. Not every single word in our vocabulary has to be extremely precise- the context matters, too. In most contexts, language is allowed to be a bit fuzzy.
A material problem with this definition is that it is completely context dependent. The last level language will change depending on what platform you’re on.
If you’re on the browser 10 years ago, that JS (does that make it low level?). If you’re on the server, JS can be compiled to native so is JS high-level now?
I don’t mind the context sensitivity; I mind the confusing connotations it creates for people: the same language is being transpiled and compiled for different platforms?
If you’re on the browser 10 years ago, that JS (does that make it low level?). If you’re on the server, JS can be compiled to native so is JS high-level now?
It’s an orthogonal question to ask if the output language “is” low-level or high-level. All that matters is whether it’s the last representation of your code before being fed to the black-box that runs it.
If you’re spitting out JS to run in a browser, then that would be “compiling” and if you’re spitting out JS for the backend to run on Node, then it’s “transpiling” because Node (V8) has its own bytecode language that the JS will be transformed to. (Aside: Though, I’m not sure if you can feed V8 bytecode directly to Node, so maybe that’s really “compiling”, too.)
the same language is being transpiled and compiled for different platforms?
Why is that a problem? You can either write JavaScript or you can transform some other language to JavaScript, but that doesn’t leave me perplexed about JavaScript. So why is it any more vexing that a language might be an intermediate representation in some contexts and a final compilation output in others?
Okay, but again, “running” something is a very blurry line to draw. Browsers “run” JS code by (in part) compiling it to native code which should preclude JS from being the lowest level language in that stack…
In general, the whole point I’m making is that the only defining characteristic of a language is its semantics. If you’re working within those semantics, then it’s the same language. Otherwise, it isn’t.
Okay, but again, “running” something is a very blurry line to draw. Browsers “run” JS code by (in part) compiling it to native code which should preclude JS from being the lowest level language in that stack…
No, because it’s the lowest level language that I can author a program for that platform in. As far as I know, V8/WebKit/whatever doesn’t allow me to feed it bytecode or native code or whatever. So, JS is the lowest level language for the browser platform.
We can play reductionist mental games until we conclude that there’s no such thing as compiling or transpiling because programmers are really just electricians who are controlling transistor currents, but it’s not a useful mental model.
In general, the whole point I’m making is that the only defining characteristic of a language is its semantics. If you’re working within those semantics, then it’s the same language. Otherwise, it isn’t.
Now, wait a minute. I thought we were talking about compilers and transpilers. The title of the article is “Transpiler, a meaningless word.” Transpilers and compilers are not programming languages. And the act of transpiling or compiling is not the same as writing a program in a programming language. So, of course I agree that writing JavaScript code is writing JavaScript code. But, that has nothing to with whether it’s meaningful or useful to distinguish between transpiling or compiling.
The author points out that the generated language, if the transpiler is actually implemented well, looks very different from how a person would think to write that language at that level of abstraction. Think of asm.js (precursor to webassembly) which was a subset of JavaScript. You’re translating to the semantics of the target language, which is similar to targeting VM bytecode.
So, just to be clear, the difference between a compiler and transpiler is whether the code is for human consumption or not? That means coffeescript, babel, typescript etc. are not transpilers and are compilers. Also, I am the author :->
I 100% agree with your take, and notice that even CoffeeScript, BabelJS, TS, Emscripten don’t even claim to be transpilers (anymore?) but their websites all say “compiler”
The reason I dislike the word is that “translator” (even without qualifying with “source-to-source”) fits the same role perfectly and it’s an actual word that even exists in literature with that meaning. Hybridizing “translator” with “compiler” to get “transpiler” has honestly always puzzled me (even more so since technically they are synonyms)
No, I basically think you’re right that transpiler isn’t a very useful concept, or at least the distinction quickly falls apart when you get into the actual weeds of the thing.
I already dislike analogies but I will engage with this one before generalizing more broadly. First off, there is no functional difference between a transpiler and a compiler - they do the same thing, they solve the same problems, etc. There is no analog to “but busses are publicly funded and carry lots of people, cars are privately owned and carry a small number, trucks are big and designed to carry blah blah blah” - because analogies suck.
Further, people need to talk about cars, trucks, and busses, differently. They’re radically different things that are used for radically different purposes. If someone said “I’m waiting for the delivery bus” or “My mom drove me on the bus to school today” it would be confusing. If someone said “I compiled Typescript to Javascript” you would not get any more or less information than if they had said “I transpiled my Typescript to Javascript”. I would challenge you to find a single situation where the word “transpiler” was an important word in a sentence and added more / different information than the word “compiler”.
But more importantly, there is no need for a distinction because there is no distinction, and pretending there is one is confusing. For example, if I compile my code to C and then I compile that C to llvm and then that llvm to x86, did I “transpile” it or compile it? There is no correct answer to this because the word is nonsensical, meaning that you can attribute whatever definition you like to it. It is perhaps fine in a colloquial sense where two people already know they’re on the same page with regards to a definition but it should be absent otherwise.
“Transpiler” takes existing imprecisions and exacerbates them. We already have incorrect colloqualisms like “compiled languages” or “interpreted languages”, transpilers create a whole new set of fake concepts that only serve to confuse people.
I would challenge you to find a single situation where the word “transpiler” was an important word in a sentence and added more / different information than the word “compiler”.
“A transpiler is the exact same thing as a compiler” makes a lot more sense than “A compiler is the exact same thing as a compiler” :P
First off, there is no functional difference between a transpiler and a compiler - they do the same thing, they solve the same problems, etc
Is there none? If something is a “compiler” I expect the output to be binary or pretty unreadable, if it’s a “transpiler” I expect it to be somewhat human-readable text.
Can you really read and understand the output of the transpiled code for JS in the blog post? Even if you can read that code, I promise that when it’s run on larger programs, it’s completely unreadable.
What is the important distinction that using transpiler as a term preserves?
It’s clear that there’s a difference between cars (move small numbers of people), buses (large numbers of people - and the distinction between these two classes is fuzzy), and trucks which move more cargo than people.
There’s a very big distinction though. One is meant to be the “entire thing” that produces some sort of bytecode / machine code targetting a real or virtualised hardware platform. The other is designed to target an existing more-common existing compiler for the purposes of bootstrapping, language design experimentation, or just changing syntax.
So, is the Rust compiler a transpiler because it generates LLVM, which itself is compiled down to some target assembly. The point is exactly that there has always been a tower of compilers all of which take you from one abstraction level to another. Where does a “transpiler” really fit in this world? Is it the first level compiler? What happens when someone build a “transpiler” to that level of abstraction? Does it the old “transpiler” graduate to being a compiler?
I agree completely. I think it’s also telling that one of the original books on the subject, which documents the programming language XPL and it’s tooling, uses terms like “translator” and what not for describing what it does (they had some more interesting terms as well, but I’d have to find my copy of the book). Other historical examples like RATFOR and cfront and so on all mix “compiler” and “translator” pretty freely, so I don’t see an issue with munging the two into “transpiler.”
The one nice thing about “transpiler” is that it combines the two existing words, “compiler” and “translator”, which were previously the accepted names for these sorts of transformations. If all compilers-and-translators courses were replaced with transpilers courses, then maybe it wouldn’t be so controversial. The problem is the insistence that transpilers are somehow neither compilers nor translators.
In my experience most of the people who use the word “transpiler” are under the impression that “it’s not a compiler unless it emits machine code” which is frustrating.
Unfortunately that probably is because originally the word compiler meant compiling to machine code (at the time) from human readable text and if you don’t believe me here is a few definitions copied off of a few well known search engines:
a program that converts instructions into a machine-code or lower-level form so that they can be read and executed by a computer.
(computer science) a program that decodes instructions written in a higher order language and produces an assembly language program
book definition:
a computer program that translates an entire set of instructions written in a higher-level symbolic language (such as C) into machine language before the instructions can be executed
I do agree though that the definition has been expanded upon, Wikipedia has a fairly open definition of it:
In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language).
I went down a fairly deep rabbit hole with some talented theoreticians after making a comment like that. Defining what ‘lower level’ means is incredibly hard and it became clear that any definition that we came up to often didn’t give even a partial ordering over the kinds of things that we care about in practice.
IIUC (not a hardware person) these days processors actually translate back from machine code in order to recover the original program graph, in order to better divvy up work on the hardware itself. So machine code is really an abstraction over what is actually going on under the hood.
I’m not an expert, but I don’t think that’s right. Processors do reorder instructions and split them up into “micro-ops”, but I don’t think it’s possible or desirable to reconstruct the original graph in the CPU.
But what would you consider even lower-level than machine code?
I would have guessed they left it open on purpose in the event there was something lower in the future. (They just left it open in the wrong a different way.)
At any rate I was only pointing out the easily accessible definitions do specifically mention machine-code.
Asking as a person with no experience in compilers or “transpilers”: Is there a body of knowledge/techniques that are important to writing compilers that’s not important for transpilers, or vice versa?
That’s an interesting question and AFAICT, the answer is no. At a very high-level, a compiler needs a frontend (which takes the input language and transforms it into some more optimizable representation), a middle-end (that implements optimizations as source-to-source translations), and a backend (which generates code for the particular target).
The post outlines people’s general arguments about how a transpiler doesn’t need one of these but the point I was making is that really “transpilers” end up needing all of them.
In my idiolect, a transpiler is a kind of compiler that targets a high level language.
There are a few things that a transpiler can leave out of its back end that you need when targeting machine language but not for a high level language. The most obvious is register allocation. Another is memory layout of data structures. You might also be able to skip things like transformation to static single assignment form, and all the optimizations SSA is designed to unlock, because presumably they will be handled by the target language’s compiler.
But if you are writing a compiler targeting LLVM IR then you also get to skip a lot of of these back-end-ish algorithms, but I would argue you aren’t writing a transpiler because LLVM IR is not a high level language.
Fun things happen if you want to target a higher-level language from a low-level compiler back-end, because you usually want to recover higher-level control structures (block-structured if and loops) from a soup of basic blocks - and you must if you are targeting a language without goto, such as wasm. I would argue that if your compiler has lowered the input program so far that it has to un-lower it for the target language, then it isn’t a transpiler.
So another way of stating my definition is that a transpiler has a much simpler back-end that omits lots of the things a compiler needs when targeting machine language. I can’t think of anything interesting that a transpiler needs to do that a compiler does not.
People claim this distinction often but I don’t really buy it. The backend of a “transpiler” can be pretty complex too especially because it has to target a high level language. What is the best with to encode some construct? Do I use for loops, while loops, generators, etc.? These choices get pretty complicated pretty quickly.
It’s really that the conception of a transpiler is different from the reality. Real tools need to address the complexity of the semantics of languages.
If there is a large semantic gap between the source language and the target language to the extent that your compiler has to perform nontrivial transformations then it would be a stretch to call it a transpiler. That is why I put relooper-style algorithms on the compiler side, not the transpiler side.
Maybe another way of saying it is, a transpiler goes across not down; it doesn’t need much in the way of lowering transformations to simpler IRs. And it is those lowering transformations, especially the IRs and transformations that are closer to machine language, that are not part of the body of knowledge you need to write a transpiler.
I’ll steal from Lex Fridman’s interview of Chris Lattner (if memory serves) and say that a compiler is anything that represents things and transforms these representations.
Maybe it would have been better if we called them translators instead of compilers. Heck, C-family compilers even call their units of work Translation Units.
The techniques from the world of compilers can be applied in things like database query planners, hardware design tools, JITs, and maybe even some things that don’t necessarily represent “programs”.
I’m old enough that in my mind these things are called “cross-compilers” instead of transpilers. I honestly still don’t know why the new term came about for a thing that already had a name.
the real galaxy brain take IMO (after reading through verified compilation stuff) is that there’s no compilation, only translation from one language and its semantics into another language and its semantics. This has the benefit of also covering certain same-level optimizations (which are transforming from the source language…. to the source language).
Not clear that “transpiler” is used consistently to define things that are subset of “compiler”. The use of term feels quite broad and describes everything from simple script hacks to complex toolchains (like BabelJS)
My hobby project of executing elisp on top of javascript does not fit any other definition than a “just in time transpiler”: it has to expand macros dynamically (late binding of macros is a thing), it has to watch out for expansion of macros being changed by other state. While it does compile pieces of elisp to javascript, I cannot call this a compiler.
I don’t think it’s a meaningless word, but I don’t think it’s very useful either. The main distinction seems to be that transpilers are from one high level language to another, but that’s pretty ambiguous and almost all readers will know that that is what is happening if you use the word compiler instead because the source and destination languages will be known from context.
Personally, I prefer the older term “translator”, because the business of a compiler is more about translation than aggregation (compilation). OTOH, the phrase “optimising compiler” sounds cooller than “optimising translator”, so who can really say which is better :)
Author here: can you describe what a “high-level language” is? My point in the post is that terms like “high-level language” aren’t particularly well-defined. I think people end up focusing on the syntax of the languages instead of the semantics which isn’t a particularly great way to build these tools.
There is also Felleisen’s idea of “expressive power” which identifies higher-level language features as those which require a global transformation of the program to lower it to a language without that feature.
Felleisen’s expressive power is a bit too powerful for the kind of compiler transformations that we are discussing, but another thing that distinguishes higher and lower level languages is the question of what must be explicitly named or numbered.
unstructured control flow requires named labels
assembly requires names for intermediate values
register allocation requires spilled variables to be given numbered slots on the stack
memory layout turns named structure members into numbered offsets
Of course there are languages that are higher level in some respects and lower in others, but I bet you could conduct a survey like John Mashey’s analysis of RISC vs CISC and find an objective dividing line.
It would be quite funny to use a survey, which are sum of subjective opinions, to get “an objective dividing line”.
On a more serious note: are concatinative languages high-level by that definition? Is APL or even LISP? They certainly have some of the features but definitely not all.
Also, what happens when I build compilers to hardware description languages? Those are definitely lower level than C so is C now a high-level language?
You should take a look at John Mashey’s survey https://www.yarchive.net/comp/risc_definition.html in which he counts instruction set features like registers, addressing modes, instruction sizes, and finds that ISAs fall into two clear camps. [Amusingly, the ones that survived best are the least CISCy CISCs (x86 and 370) and the least RISCy RISC (arm).]
You can do something similar with programming languages by identifying characteristic features (as I started doing above) and counting how many features each language has. These are observations, not subjective opinions.
Obviously a language does not have to have all characteristically X features to be an X language. It’s a matter of being more X than Y or Z.
Stack languages are an interesting case because stack manipulation is a low-level feature: the program has to worry about details that are not relevant to the problem at hand. But that feature alone does not say much about the language as a whole; for example FORTH is relatively low-level language whereas Postscript is relatively high-level.
Of course C is a high-level language. So are hardware description languages.
If something targets asm.js is it a transpiler (because it’s just a subset of JavaScript, a high level language) or a compiler? Good transpilers will output code that is analogous to asm.js in the target language.
in my headcanon, transpilers have the “same” source and target language. idk how correct that is, but if it has a meaning at all, it seems to line up with a lot of the things people do
things like babel, closure, code formatters, linters that fix things for you, etc.
they parse, do some AST fiddling, then print the AST as code
it’s still a compiler, but you keep the same node types (maybe add or remove a few). and the parse/print functions are at least attempting to be inverses
i wouldn’t call Python->C a transpiler. just because there’s no point in having a separate word if you’re going to do that. but, “transpiler” is corrupted enough that maybe there should be a 3rd word that means what i want transpiler to mean
I guess I remain unconvinced?
Obviously, the word carries meaning. Nobody “lied” to me, either. Do people really not like the term ? I don’t get it.
I think the example of Python -> C is poorly chosen (perhaps go the other way?).
I think the point is, transpilers are just compilers. There’s no real need for a distinction.
I’ve heard that argument, and in some really reductive, literal, sense it’s true. But, it’s not useful. When someone says “compiler”, they generally mean something that generates the lowest-level language instructions for a given execution platform (x86 assembly, JVM bytecode, V8 bytecode, etc). On the other hand, “transpiler” usually means something whose output target is probably not the lowest-level language for a platform.
Of course we can just call them both compilers (or transpilers) and just define that as something that takes one kind of code and spits out another, but you lose specificity.
It reminds me of the novel 1984, where they “simplified” the English language (for nefarious reasons, but that’s not why I’m citing it) by cutting out “redundancy”, so instead of having the words: great; excellent; awesome; amazing, you only have “good”. And if something is very good, or somehow better than just “good”, you’d say that thing is “good good”.
Trying to argue that “transpiler” is not a useful term feels bad bad.
Author here: your answer actually gets exactly to the problem here:
These languages are all very different! JVM bytecode is a virtual abstraction over ISAs like x86 and is compiled to them (maybe “transpiler” lovers would argue its transpiled).
Again, this is not a very meaningful distinction. What you consider lowest-level is not the same as what I might consider lowest level. Why engage in vagaries when we can be a bit more precise.
I guess I don’t see why the output languages being different should preclude the ability to distinguish between higher level output languages and lower level output languages, which is where I’m arguing the distinction lies.
Like I said, I do understand the argument that it’s all just a transformation from one programming language to another programming language. And I will certainly admit that the line is fuzzy between what we casually refer to as “compiling” vs. “transpiling”. Likewise, there’s ambiguity between “interpreted” languages and “compiled” languages, because some compiled languages also have REPLs or JIT optimizing runtimes and some interpreted languages have tools to do AOT compilation. But, as with the current topic, I still think there’s a practical, real-life relevant, distinction between interpreted languages and compiled languages. There are things that are usually true of interpreted languages and things that are usually true of compiled languages that make the distinction meaningful in real life even if it’s hard (for me) to come up with a technically precise definition.
So, when I, personally, make a distinction between “compile” and “transpile” or “translate”, the difference is that I usually think of the output from “compiling” as the lowest level language that we’re able to inspect before execution. It doesn’t matter if it’s JVM bytecode or x86 ASM or anything else; the point is that this is the “bottom” language representation of my program that I can actually inspect and manipulate.
As for why that distinction might matter, I’ll use a thought experiment. If I’m writing C++ code, one could imagine having two kinds of “build” operations for getting this program to run on my desktop: we can compile with GCC or Clang to x86 ASM, or in some alternate universe our compiler/transpiler can take my C++ code and spit out C code, which I’ll then need to transform again before it’s in its final state. If I need to really dig down and optimize the heck out of something in my code, which of those two scenarios is going to be best: inspecting the ASM or inspecting generated C code? I think the answer is obviously that inspecting the ASM is going to be more fruitful, because the C code is going to be processed and transformed again in possibly hard-to-predict ways.
Now, you can argue against all of this. You can argue that even inspecting the ASM leaves ambiguity because the OS and the CPU can do surprising things like combining operations, etc. And you’d be correct, but only in a patrician, philosophical, sense. In practice, any engineer will agree that you have more control the closer you are to the “bottom” of the latter of abstraction.
So, that’s my view on why distinguishing between “compiling” and “transpiling” can be useful even when the terms are a little fuzzy. Not every single word in our vocabulary has to be extremely precise- the context matters, too. In most contexts, language is allowed to be a bit fuzzy.
A material problem with this definition is that it is completely context dependent. The last level language will change depending on what platform you’re on.
If you’re on the browser 10 years ago, that JS (does that make it low level?). If you’re on the server, JS can be compiled to native so is JS high-level now?
I don’t mind the context sensitivity; I mind the confusing connotations it creates for people: the same language is being transpiled and compiled for different platforms?
It’s an orthogonal question to ask if the output language “is” low-level or high-level. All that matters is whether it’s the last representation of your code before being fed to the black-box that runs it.
If you’re spitting out JS to run in a browser, then that would be “compiling” and if you’re spitting out JS for the backend to run on Node, then it’s “transpiling” because Node (V8) has its own bytecode language that the JS will be transformed to. (Aside: Though, I’m not sure if you can feed V8 bytecode directly to Node, so maybe that’s really “compiling”, too.)
Why is that a problem? You can either write JavaScript or you can transform some other language to JavaScript, but that doesn’t leave me perplexed about JavaScript. So why is it any more vexing that a language might be an intermediate representation in some contexts and a final compilation output in others?
Okay, but again, “running” something is a very blurry line to draw. Browsers “run” JS code by (in part) compiling it to native code which should preclude JS from being the lowest level language in that stack…
In general, the whole point I’m making is that the only defining characteristic of a language is its semantics. If you’re working within those semantics, then it’s the same language. Otherwise, it isn’t.
No, because it’s the lowest level language that I can author a program for that platform in. As far as I know, V8/WebKit/whatever doesn’t allow me to feed it bytecode or native code or whatever. So, JS is the lowest level language for the browser platform.
We can play reductionist mental games until we conclude that there’s no such thing as compiling or transpiling because programmers are really just electricians who are controlling transistor currents, but it’s not a useful mental model.
Now, wait a minute. I thought we were talking about compilers and transpilers. The title of the article is “Transpiler, a meaningless word.” Transpilers and compilers are not programming languages. And the act of transpiling or compiling is not the same as writing a program in a programming language. So, of course I agree that writing JavaScript code is writing JavaScript code. But, that has nothing to with whether it’s meaningful or useful to distinguish between transpiling or compiling.
This is incorrect. Lots of people use the term “compiler” for any compiler regardless of target.
The author points out that the generated language, if the transpiler is actually implemented well, looks very different from how a person would think to write that language at that level of abstraction. Think of asm.js (precursor to webassembly) which was a subset of JavaScript. You’re translating to the semantics of the target language, which is similar to targeting VM bytecode.
So, just to be clear, the difference between a compiler and transpiler is whether the code is for human consumption or not? That means coffeescript, babel, typescript etc. are not transpilers and are compilers. Also, I am the author :->
I 100% agree with your take, and notice that even CoffeeScript, BabelJS, TS, Emscripten don’t even claim to be transpilers (anymore?) but their websites all say “compiler”
The reason I dislike the word is that “translator” (even without qualifying with “source-to-source”) fits the same role perfectly and it’s an actual word that even exists in literature with that meaning. Hybridizing “translator” with “compiler” to get “transpiler” has honestly always puzzled me (even more so since technically they are synonyms)
No, I basically think you’re right that transpiler isn’t a very useful concept, or at least the distinction quickly falls apart when you get into the actual weeds of the thing.
Ah, just like cars and trucks both just move people along the road so they are buses. No real need for a distinction.
I already dislike analogies but I will engage with this one before generalizing more broadly. First off, there is no functional difference between a transpiler and a compiler - they do the same thing, they solve the same problems, etc. There is no analog to “but busses are publicly funded and carry lots of people, cars are privately owned and carry a small number, trucks are big and designed to carry blah blah blah” - because analogies suck.
Further, people need to talk about cars, trucks, and busses, differently. They’re radically different things that are used for radically different purposes. If someone said “I’m waiting for the delivery bus” or “My mom drove me on the bus to school today” it would be confusing. If someone said “I compiled Typescript to Javascript” you would not get any more or less information than if they had said “I transpiled my Typescript to Javascript”. I would challenge you to find a single situation where the word “transpiler” was an important word in a sentence and added more / different information than the word “compiler”.
But more importantly, there is no need for a distinction because there is no distinction, and pretending there is one is confusing. For example, if I compile my code to C and then I compile that C to llvm and then that llvm to x86, did I “transpile” it or compile it? There is no correct answer to this because the word is nonsensical, meaning that you can attribute whatever definition you like to it. It is perhaps fine in a colloquial sense where two people already know they’re on the same page with regards to a definition but it should be absent otherwise.
“Transpiler” takes existing imprecisions and exacerbates them. We already have incorrect colloqualisms like “compiled languages” or “interpreted languages”, transpilers create a whole new set of fake concepts that only serve to confuse people.
“A transpiler is the exact same thing as a compiler” makes a lot more sense than “A compiler is the exact same thing as a compiler” :P
Is there none? If something is a “compiler” I expect the output to be binary or pretty unreadable, if it’s a “transpiler” I expect it to be somewhat human-readable text.
Can you really read and understand the output of the transpiled code for JS in the blog post? Even if you can read that code, I promise that when it’s run on larger programs, it’s completely unreadable.
That’s such a vague concept I think it’s more confusing to have a word for it than not. You’d be better served saying “compiles to human readable X”.
What is the important distinction that using transpiler as a term preserves?
It’s clear that there’s a difference between cars (move small numbers of people), buses (large numbers of people - and the distinction between these two classes is fuzzy), and trucks which move more cargo than people.
There’s a very big distinction though. One is meant to be the “entire thing” that produces some sort of bytecode / machine code targetting a real or virtualised hardware platform. The other is designed to target an existing more-common existing compiler for the purposes of bootstrapping, language design experimentation, or just changing syntax.
So, is the Rust compiler a transpiler because it generates LLVM, which itself is compiled down to some target assembly. The point is exactly that there has always been a tower of compilers all of which take you from one abstraction level to another. Where does a “transpiler” really fit in this world? Is it the first level compiler? What happens when someone build a “transpiler” to that level of abstraction? Does it the old “transpiler” graduate to being a compiler?
This pointless abstruse pseudo-nitpicking does not change the fact that everybody knows what is meant by “transpiler” vs “compiler”
Author here: can you define what a transpiler is?
I agree completely. I think it’s also telling that one of the original books on the subject, which documents the programming language XPL and it’s tooling, uses terms like “translator” and what not for describing what it does (they had some more interesting terms as well, but I’d have to find my copy of the book). Other historical examples like RATFOR and cfront and so on all mix “compiler” and “translator” pretty freely, so I don’t see an issue with munging the two into “transpiler.”
The one nice thing about “transpiler” is that it combines the two existing words, “compiler” and “translator”, which were previously the accepted names for these sorts of transformations. If all compilers-and-translators courses were replaced with transpilers courses, then maybe it wouldn’t be so controversial. The problem is the insistence that transpilers are somehow neither compilers nor translators.
In my experience most of the people who use the word “transpiler” are under the impression that “it’s not a compiler unless it emits machine code” which is frustrating.
Unfortunately that probably is because originally the word compiler meant compiling to machine code (at the time) from human readable text and if you don’t believe me here is a few definitions copied off of a few well known search engines:
book definition:
I do agree though that the definition has been expanded upon, Wikipedia has a fairly open definition of it:
Yeah, I understand the word has multiple definitions; I was expressing frustration that people don’t seem to be aware of the definition that I use.
A definition that specifically excludes other meanings which are in active use is … just not a very good definition?
Yes and no, this article is literally someone arguing that in reverse with the word transpiler so … you tell me?
I went down a fairly deep rabbit hole with some talented theoreticians after making a comment like that. Defining what ‘lower level’ means is incredibly hard and it became clear that any definition that we came up to often didn’t give even a partial ordering over the kinds of things that we care about in practice.
I read that as meaning machine-code or lower-than-machine-code form.
Natural languages are beautifully ambiguous. But what would you consider even lower-level than machine code?
IIUC (not a hardware person) these days processors actually translate back from machine code in order to recover the original program graph, in order to better divvy up work on the hardware itself. So machine code is really an abstraction over what is actually going on under the hood.
I’m not an expert, but I don’t think that’s right. Processors do reorder instructions and split them up into “micro-ops”, but I don’t think it’s possible or desirable to reconstruct the original graph in the CPU.
I would have guessed they left it open on purpose in the event there was something lower in the future. (They just left it open in
the wronga different way.)At any rate I was only pointing out the easily accessible definitions do specifically mention machine-code.
My general guideline is:
All three are a lossy process.
Asking as a person with no experience in compilers or “transpilers”: Is there a body of knowledge/techniques that are important to writing compilers that’s not important for transpilers, or vice versa?
That’s an interesting question and AFAICT, the answer is no. At a very high-level, a compiler needs a frontend (which takes the input language and transforms it into some more optimizable representation), a middle-end (that implements optimizations as source-to-source translations), and a backend (which generates code for the particular target).
The post outlines people’s general arguments about how a transpiler doesn’t need one of these but the point I was making is that really “transpilers” end up needing all of them.
In my idiolect, a transpiler is a kind of compiler that targets a high level language.
There are a few things that a transpiler can leave out of its back end that you need when targeting machine language but not for a high level language. The most obvious is register allocation. Another is memory layout of data structures. You might also be able to skip things like transformation to static single assignment form, and all the optimizations SSA is designed to unlock, because presumably they will be handled by the target language’s compiler.
But if you are writing a compiler targeting LLVM IR then you also get to skip a lot of of these back-end-ish algorithms, but I would argue you aren’t writing a transpiler because LLVM IR is not a high level language.
Fun things happen if you want to target a higher-level language from a low-level compiler back-end, because you usually want to recover higher-level control structures (block-structured if and loops) from a soup of basic blocks - and you must if you are targeting a language without goto, such as wasm. I would argue that if your compiler has lowered the input program so far that it has to un-lower it for the target language, then it isn’t a transpiler.
So another way of stating my definition is that a transpiler has a much simpler back-end that omits lots of the things a compiler needs when targeting machine language. I can’t think of anything interesting that a transpiler needs to do that a compiler does not.
People claim this distinction often but I don’t really buy it. The backend of a “transpiler” can be pretty complex too especially because it has to target a high level language. What is the best with to encode some construct? Do I use for loops, while loops, generators, etc.? These choices get pretty complicated pretty quickly.
It’s really that the conception of a transpiler is different from the reality. Real tools need to address the complexity of the semantics of languages.
If there is a large semantic gap between the source language and the target language to the extent that your compiler has to perform nontrivial transformations then it would be a stretch to call it a transpiler. That is why I put relooper-style algorithms on the compiler side, not the transpiler side.
Maybe another way of saying it is, a transpiler goes across not down; it doesn’t need much in the way of lowering transformations to simpler IRs. And it is those lowering transformations, especially the IRs and transformations that are closer to machine language, that are not part of the body of knowledge you need to write a transpiler.
I’ll steal from Lex Fridman’s interview of Chris Lattner (if memory serves) and say that a compiler is anything that represents things and transforms these representations.
Maybe it would have been better if we called them translators instead of compilers. Heck, C-family compilers even call their units of work Translation Units.
The techniques from the world of compilers can be applied in things like database query planners, hardware design tools, JITs, and maybe even some things that don’t necessarily represent “programs”.
As far as I can tell, the term ‘translator’ was more common until the ‘80s. I am not sure why compiler won out, it’s a less informative term.
I’m old enough that in my mind these things are called “cross-compilers” instead of transpilers. I honestly still don’t know why the new term came about for a thing that already had a name.
I think of a cross-compiler as a compiler targeting a non-native ISA, whereas a transpiler typically outputs a language with a context-free grammar
In the past it used to be used in both contexts.
Noting that both Cython and Nuitka use “compiler” and not “transpiler”.
I would like to see the sources for the lies that are being refuted.
the real galaxy brain take IMO (after reading through verified compilation stuff) is that there’s no compilation, only translation from one language and its semantics into another language and its semantics. This has the benefit of also covering certain same-level optimizations (which are transforming from the source language…. to the source language).
That is the definition of a compiler
wait… so why is there “transpiler” as a word at all then?
Because some people don’t know what a compiler is
rectangle : square
compiler : transpiler
sometimes there are words for subsets of things.
Not clear that “transpiler” is used consistently to define things that are subset of “compiler”. The use of term feels quite broad and describes everything from simple script hacks to complex toolchains (like BabelJS)
it should be clear that transpiler is a type of compiler from my definition.
other people use other definitions, but I think those are bad definitions.
But what’s the daylight between a compiler and a transpiler in the definition you’re mentioning? What thing is a compiler but not a transpiler?
it’s not clear cut. these terms are based on the concepts of high and low level languages, which are fuzzy terms.
It reminds me of this comic: https://p.hagelb.org/devolution.png
My hobby project of executing elisp on top of javascript does not fit any other definition than a “just in time transpiler”: it has to expand macros dynamically (late binding of macros is a thing), it has to watch out for expansion of macros being changed by other state. While it does compile pieces of elisp to javascript, I cannot call this a compiler.
You kinda just did?
I guess you meant to say it ‘translates’ elisp to javascript.
I think compilation is a perfectly appropriate word for what you described. translation is also a fine word for it.
I don’t think it’s a meaningless word, but I don’t think it’s very useful either. The main distinction seems to be that transpilers are from one high level language to another, but that’s pretty ambiguous and almost all readers will know that that is what is happening if you use the word compiler instead because the source and destination languages will be known from context.
Personally, I prefer the older term “translator”, because the business of a compiler is more about translation than aggregation (compilation). OTOH, the phrase “optimising compiler” sounds cooller than “optimising translator”, so who can really say which is better :)
It is meaningful.
A compiler is a semantics preserving transformation from one language to another.
A transpiler is a compiler from a high level language to another.
Author here: can you describe what a “high-level language” is? My point in the post is that terms like “high-level language” aren’t particularly well-defined. I think people end up focusing on the syntax of the languages instead of the semantics which isn’t a particularly great way to build these tools.
High level languages have features like:
There is also Felleisen’s idea of “expressive power” which identifies higher-level language features as those which require a global transformation of the program to lower it to a language without that feature.
Felleisen’s expressive power is a bit too powerful for the kind of compiler transformations that we are discussing, but another thing that distinguishes higher and lower level languages is the question of what must be explicitly named or numbered.
Of course there are languages that are higher level in some respects and lower in others, but I bet you could conduct a survey like John Mashey’s analysis of RISC vs CISC and find an objective dividing line.
It would be quite funny to use a survey, which are sum of subjective opinions, to get “an objective dividing line”.
On a more serious note: are concatinative languages high-level by that definition? Is APL or even LISP? They certainly have some of the features but definitely not all.
Also, what happens when I build compilers to hardware description languages? Those are definitely lower level than C so is C now a high-level language?
You should take a look at John Mashey’s survey https://www.yarchive.net/comp/risc_definition.html in which he counts instruction set features like registers, addressing modes, instruction sizes, and finds that ISAs fall into two clear camps. [Amusingly, the ones that survived best are the least CISCy CISCs (x86 and 370) and the least RISCy RISC (arm).]
You can do something similar with programming languages by identifying characteristic features (as I started doing above) and counting how many features each language has. These are observations, not subjective opinions.
Obviously a language does not have to have all characteristically X features to be an X language. It’s a matter of being more X than Y or Z.
Stack languages are an interesting case because stack manipulation is a low-level feature: the program has to worry about details that are not relevant to the problem at hand. But that feature alone does not say much about the language as a whole; for example FORTH is relatively low-level language whereas Postscript is relatively high-level.
Of course C is a high-level language. So are hardware description languages.
If something targets asm.js is it a transpiler (because it’s just a subset of JavaScript, a high level language) or a compiler? Good transpilers will output code that is analogous to asm.js in the target language.
I’d argue that it’s a compiler. Would you argue that its a transpiler? If so, why?
in my headcanon, transpilers have the “same” source and target language. idk how correct that is, but if it has a meaning at all, it seems to line up with a lot of the things people do
things like babel, closure, code formatters, linters that fix things for you, etc.
they parse, do some AST fiddling, then print the AST as code it’s still a compiler, but you keep the same node types (maybe add or remove a few). and the parse/print functions are at least attempting to be inverses
i wouldn’t call Python->C a transpiler. just because there’s no point in having a separate word if you’re going to do that. but, “transpiler” is corrupted enough that maybe there should be a 3rd word that means what i want transpiler to mean
Or we could use the objectively more descriptive “source to source compiler”