1. 31
  1. 8

    FWIW I think the “verticals” strategy is pretty close to what the “nanopass” strategy advocates, although there are also some other things mixed in (back to front, etc.)


    Likewise, when developing a compiler, it’s useful to be able to think of the thing that you have at each stage of the process as already being a compiler; as you go along, it becomes a compiler for a language that’s increasingly different from the target language.

    1. 5

      I wrote my compiler (pending public release any day now!) in the “proceeding by verticals” fashion, and can confirm that it’s very gratifying. I raced directly to self-hosting; if I remember correctly, I had classes before I had C-style for loops! This sort of dogfooding is good for keeping your eye on the prize.

      On the other hand, bootstrapping successive compiler states can make it hard to rebuild your project if you lose your intermediate states. What I do is, I have in the repo a list of compiler versions by Git hash, and to do a complete rebuild I check out the first, build the compiler, install the compiler in a tempdir, check out the second, build the compiler… With caching for every step. If I were to start over on a new machine, I would probably need to let it run overnight to get back up to master, even assuming nothing failed.

      I also want to sign the sentence on compiler theory. (This is why I have never read a book or taken a course on compilers.) You don’t need parsers! You don’t need backend analysis! Just use a recursive-descent parser such as can be trivially written in basic C, and output to C or LLVM. It won’t be as fast as a proper parser, but you can worry about that when you run out of other issues.

      Though I will say that understanding and using SSA form at least is strongly recommended; the backend drivers are by far the cleanest part of the codebase, and I credit this to using a simple SSA interface to drive them.

      Also, whenever I start a language, the first program I translate is usually the Ackermann function, A(m, n) = A(m - 1, A(m, n - 1)), A(m, n=0) = A(m - 1, 1), A(m=0, n) = n + 1. It’s nice in that it has recursion, only requires integer math, is easy to verify but also generates enough load that you can get a quick idea of your performance. (The second is usually a Mandelbrot. Fractals are nice.)

      Separate compilation from the start is very pleasant. You really want to be able to compile files independently and recursively. Eight-core machines are the default now; to not design from the start for threading is leaving 80% of performance on the table. This is why near every data structure in my compiler is immutable or append-only. This incurs some cost to start (it’s a lot faster to resolve the intermediate states directly in the syntax tree), but you (hopefully) get it back in parallelism.

      edit: It occurs to me that we might be talking about a different thing. I don’t really think separate compilation in the C sense of running multiple compilers is very useful. The main problem is that you get unavoidable duplication in parsing included files, which can frequently be much more code than your actual files. However, at least thinking in terms of proceeding through the compilation in an arbitrary order or even in parallel is beneficial, as mentioned; you can reuse the already-imported modules in cache. If possible, you want to be able to treat your input files as a tree rather than a list. (C famously fails at this.)

      Dependency loops are real, yo. Everything in a compiler depends on everything else. What I ended up doing was splitting out the compiler API into a separate class, in a module that has all the basic data structures in a list - Compiler, Symbol, Expression, Type, Statement, ASTSymbol, ASTStatement, etc, etc, etc. Then every module can implement its part of the language, while only referencing the other modules through that shared API. Doesn’t always work, but does work pleasantly often.

      What I regret the most is not going with a SSA design for my middle-end/IR. “Big Tree of classes” works, but it’s not memory-friendly and ime leads to a lot of accidental duplication.

      1. 4

        What do you think of query-based compiler architecture?

        1. 1

          Hm. A lot of this falls out naturally just by doing pervasive caching, so whether you end up push-based or pull-based or a mix kind of starts blending together. You just have some way of generating a list of tasks that are uniquely identified by parameters and cached; whether this list comes from a list of commandline flags or a recursive search doesn’t seem like it would change the implementation too much. I independently ended up with something sort of close to that, where the compilation is split into individual tasks and each task has a unique key, so if the same task is queued twice the task scheduler just runs it once, blocking, and then returns the same result to both.

          I guess it depends on how finegrained you want to go with it. It sounds like they’re using a single central cache for everything, including things like symbol lookups, which is what allows them to track indirect and reverse dependencies. Now tracking reverse dependencies in the task scheduler is a very good idea that I hadn’t thought of (and that I may have to steal), but I’d be a bit wary putting so much load on a single data structure.

        2. 2

          output to C SSA form at least is strongly recommended; the backend drivers are by far the cleanest part of the codebase, and I credit this to using a simple SSA interface to drive them.

          Are you saying that it’s easy to output C from an SSA representation? I had the opposite impression, from posts I’ve read by others.

          1. 2

            IME the one awkward thing is phi nodes, and I haven’t really got a good intuition for when you need those yet. The thing that comes out at the end is not good C code, ofc, but it seems to work. So I just don’t support phi nodes in the C backend. :P

            Code is at https://github.com/Neat-Lang/neat/blob/master/src/backend/c.nt which implements the API at https://github.com/Neat-Lang/neat/blob/master/src/backend/base.nt . It looks simple to me.

            The main upside of SSA is that it lets you break up your interface into very simple commands.

            1. 2

              Phi nodes are used for branches and loops.

              For a c-like target, you can implement them as follows: if we have C <- phi(A, B), then generate an assignment to C at each of the definitions of A and B.

              1. 1

                Sure, but with a naive backend you see the phi node after the definitions, so you need to backtrack, you can’t just generate C as you get SSA instrs, as you can with every other instruction.

                Also, using phi nodes for loops in a C-like language at least seems unusual. Usually you’re not putting in any special effort to identify the loop variable, you just alloca it like any other variable. (As a phi node won’t help you if the user decides to for (int i = 0; i < 10; i++) { i = i * 2; }.) Though it makes sense that phi nodes would appear during optimization, or in languages where loops are primitives and ifs are expressions. Or for ternaries in a C-like.

                So there are plausible uses for phi nodes, I just haven’t found any yet where they’re the natural choice. It’s gonna happen eventually.

                1. 2

                  It is rare not to have def-use edges. You are definitely doing something unusual if you don’t have them.

                  If you are not optimising (as you have possibly indicated?), then it seems strange to bother with ssa at all. It doesn’t do anything useful, and is likely to result in crap code generation.

                  for (int i = 0; i < 10; i++) { i = i * 2; }

                  i0 <- 0
                  goto loop
                  i1 <- phi(i0, init, i3, loop)
                  i2 <- i1 * 2
                  i3 <- i2 + 1
                  branch(i3 < 10, loop, exit)
                  1. 1

                    Right, I’m just outputting naive unoptimized SSA instrs on the backend, leaving all the optimization to LLVM. I think SSA, even philess SSA, is good for its own sake.

                    You could do i like that, but then you need to be looking ahead to see if the user ever does anything clever like &i, where this breaks down. You effectively end up with two classes of variables. Much less effort to always treat it as an alloca and trust the optimizer. Then again, the easiest instruction to optimize is one you never generate…

                    i <- alloca(4)
                    store(0, i)
                    goto loop
                    _0 <- load(i)
                    _1 <- _0 * 2
                    store(_1, i)
                    _2 <- load(i)
                    _3 <- _2 + 1

                    and you get the idea. And if you ever &i, you can just use i directly, you don’t need to look ahead or backtrack or do an analysis pass at all. You know everything about i as you read int i.

                    Idk, I feel the niceness of not bothering with being clever is underestimated. One-pass compilers ho!

        3. 4

          This is a great article. Highly recommend! It’s very true that the “old standard” compiler books have become decreasingly helpful when getting started. Some notes:

          Parser combinators: They usually aren’t ordered anymore (see packrattle). A modern one effectively builds a state machine and traverses each alternative “in parallel”. But I agree with the general point, and I almost always end up writing a custom parser each time. And I’ll add that you should think about error recovery and auto-completion up front (tree-sitter) because some parser styles will make it very hard to add in later.

          Bootstrapping compiler: Sometimes I’ve noticed this makes the language most suited to building other compilers. If you’re building a systems language, it’s a great idea. If you’re trying to write a language with a particular purpose (calculations, or scripting a game), I think it’s best not to make it self-compiling. A lot of the features that streamline parsing and code generation may just add bloat and scope creep.

          (Was the blog enormous and magnified to anyone else? I thought my browser was broken until I found the right CSS and fixed it.)

          1. 3

            I haven’t read the whole thing yet but I do love me an honest sentence like:

            I have a sense that the strategy of proceeding by verticals is better, but I can’t endorse either strategy because I haven’t compared them.