1. 79
  1.  

  2. 15

    For those who don’t know, these slide are by Graydon Hoare, the person who did the initial legwork for Rust.

    1. 14

      “1971: “A Catalogue of Optimizing Transformations… The ~8 passes to write if you’re going to bother… Many compilers just do those, get ~80% best-case perf.”

      Looks like she found the 80/20 rule for compilers before C was invented. Graydon says it still applies. @hwayne, maybe add her to your list of women in computing. ;)

        1. 8

          That paper is so much more accessible than a lot of modern literature on the topic.

          1. 1

            not a single reference to typed lambda calculus

        2. 3

          The term optimization is a misnomer in that it is not generally clear that a particular, so called, optimizing transformation even results in an improvement to the program.

          Still true in 2019, as one can see from the endless argument about the safety of C compiler “optimization”…

            1. 1

              I was super happy to hear about this, because it’s something I was thinking about recently—which small set of optimizations would give the best bang for the buck—but didn’t know if anyone had done the work. Glad to hear it was done nearly 50 years ago :D

              1. 2

                Check out QBE. The article also implies there might be an explanation of Go’s optimizations somewhere that will give you even more to check out. Still way less than duplicating LLVM.

                1. 2

                  Great project. I have been playing with it. I am planning on interviewing the creator for https://notamonadtutorial.com/. I hope to post the interview in a few weeks.

                  1. 2

                    I look forward to that one.

                  2. 2

                    Very interesting project. thanks for the link.

              2. 6

                This is a real gold mine of information. Some acronym expansion (I didn’t know what IR was):

                • IR: intermediate representation
                • AST: abstract syntax tree

                Can any one comment on the implementation and efficiency of Variation #5 (Only compile some functions, interpret the rest)? At first I thought this meant to write the fast versions by hand (like calling C routines from Python) but then the last bullet suggest that it should actually be automated.

                Then how does the compiler select which functions to compile (or is that decision made from user input)? I guess all compiled functions can’t be stepped into?

                Variation #6 (Partial Evaluation Tricks) is also pretty interesting but other than satisfying the formal definition of a compiler, isn’t a compiled program obtained through this Futamura Projection have pretty much keep the same speed as the original interpreter? (Same for compile programs obtained from a compiler obtained from a Futamura Projection.) Mainly because the interpreter wouldn’t do any compile time optimization (since its not expecting itself to be run as a compiler!). So it seems like a step back as you also lose the advantages of the interpreter.

                Hard to provide diagnostics, type- checking, optimization, really anything other than straight translations.

                That was kind of disheating to read that going the META route, there’s not much we can (or at least know how to) do.

                1. 4

                  isn’t a compiled program obtained through this Futamura Projection have pretty much keep the same speed as the original interpreter?

                  it can be faster, it can remove a layer of interpreter overhead.

                  Suppose you have a good self applicable partial evaluator for language A written in language A.

                  You write an interpreter for language B in language A. It executes programs in language B with the speed of an interpreter running on of top of the language A platform.

                  Now you use the partial evaluator to produce a B->A compiler, compile your B program with it and you can run the resulting output with the full speed of a program written natively for the language A platform.

                  the interpreter wouldn’t do any compile time optimization

                  That’s the magic part. you don’t need to implement any optimizations*, but after all the interpreter overhead is dissolved away by the partial evaluation procedure, the optimizations of the host language implementation are applied.

                  (* You do have to be kind of careful to write the interpreter in a way that will admit good specialization though)

                  1. 2

                    Suppose you have a good self applicable partial evaluator for language A written in language A.

                    Ah, while reading the slide I was thinking of the trivial (lazy) partial evaluator that does nothing and just evalutes when the compiled program executes.

                    I see now. With a non-trivial partial evaluator, you can use it and gain in all three steps. Though now I’ve no idea how to write a useful partial evaluator.

                    1. 4

                      Though now I’ve no idea how to write a useful partial evaluator.

                      You might want to look at supercompilation, since this is the same idea but hard-coded in an otherwise-normal compiler. To get some idea of how partial-evaluation/supercompilation works, we can start with a naive compiler with no optimisations: it’s essentially doing find/replace of constructs in the source language with equivalents in the target language, with no evaluation.

                      “Constant folding” is probably the simplest form of partial evaluation, so we can add that to our compiler. This is a source-to-source optimisation pass, which looks for constant expressions like 1 + 2, performs the calculations and replaces those expression with their result (e.g. 3). ‘Performing the calculations’ just means we’re evaluating those expressions. Since we’re only evaluating some expressions, not the whole program, this is partial evaluation. This doesn’t need to be restricted to arithmetic: we can also do booleans like replacing true || false with true. In fact, if our language has short-circuiting, we can replace true || x with true even if x isn’t constant! We can do the same with branches, replacing if (true) { x } else { y } with x and likewise for false/y.

                      Another common form of partial evaluation is “function inlining”, where we replace a call to a known function with the body of that function. Again, this is a source-to-source transformation which performs evaluation (specifically: beta-reduction), but only on a sub-set of expressions. Combined with constant folding, this can remove a lot of indirection, specialising the code to our use-case before it gets compiled.

                      We can, in principle, make such source-to-source transformations for every language construct (“loop unrolling” is another example). This is actually a reasonable way to make an interpreter, known as a term rewriting system; since, at runtime, there are no unknowns, so we can treat everything as constants to be folded.

                      If we want a compiler rather than an interpreter, we need to make some decisions about when to stop the partial evaluation. This is because some constructs can get bigger and bigger, e.g. unrolling a loop with an unknown number of iterations, or inlining (mutually) recursive functions without knowing when we’ll hit the base case. This isn’t an insurmountable problem, since any choice will still result in a correct translation, but we might want some heuristics to strike a good balance between specialising-away indirection, and bloating the program with duplicate code.

                      1. 1

                        Thanks for your detailed explanation.

                        it’s essentially doing find/replace of constructs in the source language with equivalents in the target language, with no evaluation.

                        I don’t understand what you mean by this. I think of, say, source=python and target=C (or asm). There aren’t really obvious equivalent constructs (like no lists or dicts or even similar things have different semantics so they wouldn’t be equivalent).

                        I thought you’d start by modifying the first input (I) to use the second input (P) as a hard-coded value. E.g., a python interpreter with a fixed hard-coded program to interpret which can thus only interpret that particular program.

                        If the source and target are equal, I can see some of the optimization here would apply but otherwise, what you’ve described seems to only optimize the version of the emitted program (say the start program described above) where I and P are already combined, instead of treating them separately. Or maybe you meant this as a partial evaluator for a fixed I?

                        1. 1

                          I don’t understand what you mean by this. I think of, say, source=python and target=C (or asm). There aren’t really obvious equivalent constructs (like no lists or dicts or even similar things have different semantics so they wouldn’t be equivalent).

                          Heh, I was intending my example to be really, really dumb/naive, as a base-line to work from.

                          It is possible to convert Python expressions into equivalent chunks of C or ASM in a find/replace way; it’s just that those chunks will be huge in comparison. Given some Python like 1 + 2, we wouldn’t replace it with C like 1 + 2; we would replace it with C like setUpABunchOfContext(); setUpExceptionHandlingStack(); maybeAllocateABunchOfMemory(); messWithThreads(); ... and so on, then do the actual thing we care about, then do a bunch of clean up like maybeFreeABunchOfMemory(); tearDownExceptionHandlingStack(); tearDownABunchOfContext(); collectGarbage(); ..., etc.. In fact, we’d probably do all of that just to construct the value 1, then do it again for the 2, then again to look up the __add__ method (which is what + desugars to), then again to call that method. Of course, this would also require us to write all of this helper code (this is known as a “runtime system” or RTS; although most real RTS are much smaller and saner than this example!)

                          I thought you’d start by modifying the first input (I) to use the second input (P) as a hard-coded value

                          Yes, that’s called function inlining ;)

                          If the source and target are equal, I can see some of the optimization here would apply but otherwise, what you’ve described seems to only optimize the version of the emitted program

                          Constant folding, function inlining, loop unrolling, etc. are indeed source-to-source transformations rather than translations between languages, but there’s nothing stopping us from performing them as well as translating to a different language. Indeed, that’s what most optimising compilers do, like GCC/LLVM/GHC/etc. (usually by translating/compiling from the source to a some intermediate representation; then doing a bunch of source-to-source transformations in that IR language; then translating/compiling the IR into the target language).

                  2. 2

                    Then how does the compiler select which functions to compile

                    One of the replies mentioned JIT compilers. That’s not the only scenario.

                    In my language, a function is compiled (into GLSL) if it needs to be executed on the GPU, otherwise it is interpreted. Since my system is designed for live coding and rapid prototyping, you want to minimize the latency from making a code change to running the code, so I don’t do any expensive compilation (LLVM would be too slow). Instead, I perform a fast compilation into IR then interpret the IR. But once code migrates to the GPU, then the requirements change, so I compile a subset of your code to GLSL, which the GPU driver then compiles to GPU machine code.

                    1. 2

                      Then how does the compiler select which functions to compile (or is that decision made from user input)?

                      It has heuristics, similarly to how compilers decide when to inline functions. Generally it’s a function of how often a function is called – it’s decided at runtime. Basically, you have a ‘function’ construct in your vm that contains a bunch of bytecode. Each time you call that function, the bytecode is interpreted. But it keeps track of how many times the function is called. If you call it a bunch of times, then that bytecode gets compiled to native code, and then that is what’s executed.

                      I guess all compiled functions can’t be stepped into?

                      Why not? You can step through c code, which is almost inevitably all compiled.

                      1. 1

                        Your description of the heuristic sounds like just-in-time compilation (which I also don’t know well). Is it the same here? I thought this was talking about something else just because the word wasn’t used.

                        Why not? You can step through c code, which is almost inevitably all compiled.

                        Right, sorry, I meant you can’t step through the original version of the source. For C, with no optimization, you can step through it but you can’t always step through code compiled with optimization (at least with the debuggers I know of which is gdb and lldb). You can still step through the assembly, but finding the lines it corresponds to the source C code is just not there (sometimes entirely removed, being optimized out).

                        1. 2

                          Your description of the heuristic sounds like just-in-time compilation (which I also don’t know well). Is it the same here? I thought this was talking about something else just because the word wasn’t used.

                          That’s what I assume. It specifically says it’s an interpreter which compiles some things, which is basically a description of JIT.

                          you can’t always step through code compiled with optimization

                          Fair enough. I’ll admit, I don’t know that much about JIT either so someone correct me if I’m wrong.

                    2. 4

                      Wow, what a great intro to the landscape of programming language implementations. Wish I’d seen this ~2.5 years ago. It’s taken me about that long to discover all these things for myself :P

                      I didn’t realize GraalVM was using Futamura projections. I always wondered why no languages were implemented that way. It seems like an elegant way to get a compiler for nearly free.

                      1. 5

                        it’s not free, you have to implement a partial evaluator! this is extremely difficult.

                        also, the futamura projection technique only complies to the base language the partial evaluator works on which is usually a very high level language.

                        1. 2

                          Agree. This is like a literature review of compilers, and as such is very valuable to learn the landscape.

                          Thanks, Graydon!