1. 47

    1. 15

      Avoid writing your own code generator, linker, etc

      I think this hints at but doesn’t fully explore a much deeper question: do you want to create a new language or do you want to create a language and a whole new runtime?

      Historically everyone picked the second option, but nowadays it’s becoming much more common to just create a language without jumping thru all the hoops of creating a new runtime from scratch. This post cautions against going all the way and building everything from scratch, but I think it misses a broader point: targeting an existing VM will save you a ton more work than just reusing a code generator and linker; you can get debuggers, libraries, profilers, package managers, and more for free just by targeting an existing VM.

      Granted some people do want the learning experience of building everything from scratch, but for people who are interested in language implementation specifically it can be a huge yak shave. As a very nice bonus, most of the arguments for “avoid self-hosting your compiler” don’t apply unless you are building out your runtime from scratch.

      Edit: disclaimer: I’ve worked professionally mostly with Clojure (a hosted language) over the past decade and am the lead developer on Fennel (also a hosted language).

      1. 3

        I think it falls back down to do you have to. In the past it was maybe more necessary to create a whole runtime. Nowdays there are a handful of more or less pluggable runtimes that you can grab off of the shelf.

        It falls back into the same startup advice of outsourcing anything that isn’t critical to your market fit. If you are creating a language your resources will go a lot further if you make a language not a VM, JIT, runtime library and more. So if you don’t need to don’t.

      2. 1

        In a “managed” language your deeper question is relevant here, but in an AOT-compiled language like Rust or Zig you need to deal with codegen and linking even if you target the standard runtime (ABI.) In that case you can write your own, which I think Zig does, or use LLVM like Rust, or use other simpler implementations like Mir or QBE.

        1. 1

          I guess that’s part of my point; if you choose to make a language like that, you’re choosing to make a lot more than just a language. Maybe “runtime” isn’t quite the right word for it, but the point still stands.

    2. 12

      What a fantastic article.

      Unfortunately, the free tier (at the time of writing) only supports AMD64 runners, and while it does support macOS ARM64 runners, these cost $0.16 per minute.

      A big shout out here to Cirrus CI. They offer x86-64 and AArch64 for macOS, Linux, and FreeBSD. Not sure if they have Arm support for Windows.

      They have a free tier for open source projects but it’s pretty generous (don’t build LLVM every day with it, but every few days is fine) and their pricing is much more competitive than GitHub’s (which is Azure pricing plus a 10x markup).

      One thing I slightly disagree on:

      Don’t prioritize performance over functionality

      Build a high-performance prototype of the critical part. Even if you aren’t able to connect everything together, build a fast prototype the core bit of the object model, the dispatch routine, or whatever it is and write some realistic microbenchmarks. They don’t have to be built from your compiler, they can be C or assembly fragments that represent what you eventually want your compiler to generate. This helps you avoid making early design decisions that tie you into slow operations.

      1. 2

        I took it to mean compiler performance, rather than compiled program performance.

        I’ve personally over emphasized compiler performance even without knowing anything about parts of the compiler I was implementing. Which led to some very ugly code.

        So to me that point makes sense, but I also think it is important to keep it in mind when designing the language, so that even if the compiler is not fast now, it could be made fast later when needed.

    3. 2

      Oh, and good luck finding a book that explains how to write a type-checker, let alone one that covers more practical topics such as supporting sub-typing, generics, and so on.

      TAPL is a good start, but state-of-the-art techniques are only found in research articles.

    4. [Comment removed by author]

    5. 1

      Avoid writing your own code generator, linker, etc

      Of course depending on your goals for the language. Sure, if you want to create a next 100-year language or you’re writing some commercial code, probably makes sense to use battle-tested technologies as much as possible.

      But I’d argue that still avoiding those hard topics isn’t always the way to go since still somebody needs to do those and you’re not going to learn and get better if you just altogether avoid these sort of topics.

    6. 1

      Wonderful article! I’m only two years into developing my programming language but I agree with almost every recommendation in the article.

      Avoid writing your own code generator, linker, etc

      Luckily I was never tempted to do this because I decided to use LLVM from the start. In hindsight, of my needs (a personal toy language) LLVM is probably overkill and I could have made my life even easier by using QBE.

      I feel like this paragraph should mention parser generators. I get the feeling that the use parser generators is frowned upon in the pl hacker community and many tutorials stress that you don’t /need/ a parser generator because a parser is just normal code after all. And I get that. But it’s still a lot of code you have to write and in particular have to change when prototyping a language. I’m still glad I chose the parser generator route back then. In addition to the parsing code I don’t have to write I get checks for ambiguities virtually for free.

      Avoid bike shedding about syntax

      I ended up in exactly the same place as the author, at using s-expressions to not worry about the syntax while developing the language semantics, with a vague plan to switch to a more conventional syntax down the line. I only learned this lesson one year into my project after the third major (unforced) syntax change.

      Growing a language is hard

      That’s the big one. I’m working on my language mostly for myself as a personal toy language / passion project, so growing the language is not an explicit goal for me. On the other hand, developing a language full time as a day job sounds like a dream.

      The best test suite is a real application

      Agreed. Not only does writing a real application reveal bugs in the implementation, it also shows if they are gaps in the language design. This is an area where I think self-hosted language can benefit because the entire compiler is a real application testing the language.