1. 1

    Always good to read a different take on this subject. One question I have, I wonder if the book answers, is why write a compiler in go? Any advantages or disadvantages?

    I have two thoughts on this primarily. If you’re writing a compiler, why not use a language and environment suited for the job? For example the ML family of languages excel at concisely representing algorithms and data structures used for parsing, working with ASTs, etc. But you rarely see compilers written in ML in the wild.

    The other thing is; the difficulty of getting good static analysis (for code completion, compile errors, etc) for languages has always been a problem, that’s why there’s been so many different parsers for c++ over the years (Eclipse CDT, QT creators, etc, not to mention ctags et al). Language servers fill this gap. I suppose using go for a language server has appeal due to go’s performance, single executable deployment, gc, and network service friendly nature.

    1. 3

      Ha, good question! I’ve previously had to answer it, because, yes, you’re right, there are other languages that are more suited to compiler development. I personally think any language with pattern matching and union types is far more expressive when writing compilers than a language without those features.

      But I choose Go not for its expressiveness, but because I personally think that it’s a great teaching language: it doesn’t hide anything, there’s no magic behind any line of Go, you can read Go code even if you’ve never written Go in your life, the standard library contains enough so that we don’t have to get into modules/dependencies, it comes with a testing framework, the standard installation contains gofmt and go test, it’s super stable (you can still run the code from the first version of the first book, written at the beginning of 2016, as it is — that’s not necessarily true for other languages/projects), etc.

      That’s the gist of it. I have a few more thoughts on why Go is great for teaching, but I think I should turn those into the blogpost I intend to write :)

      1. 1

        …it doesn’t hide anything, there’s no magic behind any line of Go…

        Go is a fine and relatively simple language, but I would argue there’s plenty of magic in the M:N threading, GC, structural typing, and runtime reflection.

        I certianly won’t argue against any of your other reasons, though.

        1. 1

          Ah, yes, you’re right. That can certainly be considered magic, but I was thinking more of the meta-programming magic I know from Ruby.

      2. 2

        re ML. For others wondering, here’s a 1998 article on why ML/Ocaml are good for writing compilers. Ocaml is even better at it today thanks its libraries. There’s also metaprogramming extensions to Standard ML that give it some of LISP’s power. Then, there’s people like sklogic whose tool just uses a LISP with Standard ML embedded as a DSL. Drops out of SML whenever its easier to express an algorithm outside of it.

        1. 2

          Thank you for linking that! I love the point at the end about languages being toolboxes and some are more suited for certain tasks than others:

          But all languages have some problem domains in which they shine. I think that compiler implementation is one of those areas for ML. You’re writing a compiler and in middle of a function you need a 9mm crescent wrench with a box on the other end. You open up your toolkit and … there it is, in the top drawer, bright and shiny and strong. You use it, and then a few minutes later you need a small phillips screwdriver with a clip and a magnet … and there it is, bright and shiny and strong.

          It’s not that there are an extraordinary number of tools in the toolbox. (No; in fact, the toolbox is much smaller than the usual toolboxes, the ones used by your friends that contain everything but the sink.) It’s that the toolbox was carefully and very thoughtfully assembled by some very bright toolsmiths, distilling their many decades of experience, and designed, as all good toolkits are, for a very specific purpose: building fast, safe and solid programs that are oriented around separate compilation of functions that primarily do recursive manipulation of very complex data structures.

      1. 3

        Writing an interpreter in Go seems like a bit of a copout to me, the garbage collector is probably the hardest and most interesting part… Of course, if you need a quick DSL to get work done, by all means implement it in Go, it will work very well.

        1. 3

          Go’s now entirely written in Go, in my understanding, including the GC. So you could bootstrap by using Go’s GC and then write your own eventually. (I haven’t read the book.)

          1. 3

            I explicitly address this copout in a section, because you’re right: Go is doing a lot of work for us there and we don’t need to write our own GC. But I think there are a lot of other really interesting things you’ll encounter when writing an interpreter, even without writing a GC: lexing the source code, defining your tokens, what your keywords will be, parsing those tokens, deciding what’s a statement and what’s an expression, parsing expressions, evaluating expressions and so on… In the end, I think, once you’ve finished the book you’re in a great position to tackle the GC problem on its own, as a separate topic you can dive into.

          1. 1

            Any idea why he doesn’t make it available as an e-book on Amazon?

            1. 7

              It is available in the Kindle store. Here it is on Amazon.com. It just doesn’t show up on the same product page because I haven’t gotten around to figuring out how to do that yet.

              The paperback version is printed and distributed by createspace, a company Amazon bought a few years ago, and they also take care of putting it up in several Amazon stores. The Kindle version is handled by Amazon’s own “Kindle Direct Publishing” (KDP) platform. And even though createspace is already semi-integrated into KDP it’s not that easy to just point KDP at a ISBN and it connects the two products on one product page. Or at least I haven’t found out how to do that yet :)

              1. 3

                Can I ask, because you had success with this process, what you wrote the original book in, and what your actual conversion pipeline was? Pandoc is powerful, but I have to imagine it took you a bit until you were truly happy with the output everywhere, and I’d be really curious if that’s information/a pipeline that could be reused.

                1. 5

                  The book is just a bunch of markdown files plus metadata in a YAML file. There is nothing fancy about the files: written in Pandoc’s own markdown format, including fenced code blocks, copy and pasted straight from the source code.

                  The files are first passed to pp, which only sets two variables: the current version of the book and the URL of the zipped code folder.

                  Then they are passed to pandoc, which turns them into epub, PDF (using LaTeX) and HTML. The epub version is then converted to mobi (the Kindle format) using Amazon’s own Kindlegen tool.

                  All of this is packed into a Makefile I’m ashamed to show off.

                  But, really, most of the work is done by pandoc, which I love immensely. I tried my best not to spend a lot of time on the toolchain, muttering “don’t start shaving the yak now…” to myself all the time…

            1. 7

              Never having thought about the difference between an AST and a LST and a parse tree in such detail before, my first idea was to to re-render the AST as text again, with the translations made in the tree. But of course, then you’d lose formatting information. Or you include these details in the tree, but then you’d have the bigger and thus heavier parse tree.

              At the end of the article it finally clicked for me:

              To perform style-preserving source translation, we replace the significant spans, while leaving the insignificant ones in tact.

              This is really neat.

              1. 2

                Thanks! It would be neater if you could get two algorithms out of it: as I mentioned, I think a reformatter like clang-format and gofmt is the converse: change the insignificant spans and keep the significant ones.

                I didn’t actually implement that yet. But I think that using the LST will work. If anyone has experience, let me know.

                1. 2

                  gofmt will still keep e.g. blocks delimited by blank lines separate - it’s not quite the same as ‘remove insignificant spans’.

              1. 1

                Is simplicity/complexity/stupidity of code something objective? How does one define it?

                I know there are measures like LOC or cyclomatic complexity. But these things don’t seem like complexity itself, rather symptoms people notice when observing codebases they experience as complex.

                I wonder if the “complexity” of a piece of code might not be dependent on the community of programmers reviewing the piece of code.

                For example, cyclomatic complexity can be reduced in some cases by using interfaces. But many programmers will experience a codebase with lots of interfaces as “complex”.

                1. 1

                  I think, you’ve hit the nail on the head here. It’s really, really hard to find common ground here, especially if you’re coming from different backgrounds, programming languages and communities.

                  That being said, I believe there is something we can all agree on being the “stupid”, “more obvious”, “duh, of course” version, compared to something more “clever” and “complex”. But maybe we can just decide on a case by case basis and never derive some general rule. I tried doing the latter, but think it’s still too vague.

                1. 5

                  (I suggest removing the - Thorsten Ball bit from the title.)

                  I like the writeup, but more concrete examples would help. So, towards that end, I’ll point out something from JS:

                  Consider the task of looping over a list, transforming elements, removing elements that don’t match a particular criteria, and splatting the remainder into a map of some variety.

                  The “naive” (in the author’s parlance, “stupid”) approach would be something like:

                  //    List of form [ {ticket: "company name", val: 42 }, ...]
                  function doList( list ) {
                      var xformedList = list.map( function increaseShareholderValue(el) {
                          return { ticker: el.ticker, val: el.val+34 };
                      });
                      var filteredList = xformedList.filter( function divestBadCompanies(el) {
                          return el.val % 2 != 0;
                      });
                      var companyMap = {};
                      filteredList.forEach( function goPublic(el){
                          companyMap[el.ticker] = el.val;
                      });
                      return companyMap;
                  }
                  

                  Now, there are good things about this approach:

                  • It is painfully (even excruciatingly) obvious what it does.
                  • It is understandable by anybody who knows basic ES5.
                  • It is easy to pluck out and change any steps of the transform.
                  • You can inspect the value at each step using breakpoints very easily.

                  But, there are problems:

                  • It’s slow. It updates each element only to throw away half of them later.
                  • It’s got function call overhead for everything, multiple times.
                  • The use of an object literal can mean weird things happen, sometimes, under certain circumstances.

                  The most optimal approach is going to look something like:

                  function doListBetter( list ) {
                      var companyMap = Object.create(null);
                      for( var i = 0; i < list.length; i++ ) {
                          var el = list[i];
                          if ( el.val % 2 != 0) {
                              companyMap[el.ticker] = el.val + 34;
                          }
                      }
                      return companyMap;
                  }
                  

                  This approach is good, because:

                  • It avoids function call overhead.
                  • It processes the list only once.
                  • It avoids a class of problems involving object literals.

                  It’s bad, though because:

                  • It’s easy to do a dumb thing and have the for construct not behave as expected.
                  • As you increase the complexity of the filtering criteria, that test may no longer be orderable like that (what if it requires the rest of the list suddenly, etc.).
                  • As you increase complexity of the transform and storage, the logic starts to smear out and get harder to follow.

                  So, even though the optimal version is faster, the maintainability stuff gets a lot, lot worse.

                  Write the naive approach first, profile it when needed, and then fix it.

                  1. 4

                    Author here. Your comment is really interesting to me. Why? Because since I wrote the post I’ve thought about including an example a lot and that example would pretty much look like the one you presented.

                    In the past a few people said to me: “Oh, I know what you’re saying! Just use a for loop instead of map/forEach/reduce!” But what I meant comes much closer to the approach in your comment.

                    1. 4

                      Thank you! Feel free to use it with attribution.

                  1. 18

                    I’m really excited for this. I can’t wait to read the finished version.

                    Here’s a little anecdote. A few months ago I saw a comment by Bob (the author of this book) on Reddit, saying that he’s currently working on a book about interpreters. That got me super excited, because I was working on my own book about interpreters! Slightly intimidated that someone, who’s already written a great book, will be releasing a book about the same topic and maybe competing with me, I contacted Bob. I told him about my book and asked him if we’re essentially writing the same book. I was unsure about keeping on working on my book, now that there’s a pro tackling the same issues and covering the same niche. His reply couldn’t have been more motivating and encouraging, even if he tried. He essentially told me that I should keep working on my book, that there’s always room for more books and that he can’t wait to read it. I still think back to that email from time to time and credit it with giving me one of the many pushes needed to finish the book.

                    1. 7

                      Have you thought about making print copies of your book? I’ve been looking for an excuse to play around with Go, but I heavily favor the dead tree version. Either way, congrats on finishing it!

                      1. 8

                        Yes, I plan on releasing a print version in the next couple of months. Hopefully in February. I’ll send out a message on the mailing list, as soon as I know more. The major thing that’s holding me back is creating a print-proof cover image, that has the correct size, colors, resolution, margins, etc… That’s just not my area of expertise :)

                        1. 1

                          Please do! I just bought a digital copy after forgetting for a while, but I personally would be way more excited to shell out cash for something I can hold.

                      2. 4

                        Wow! Inspiring.

                        I’m learning Go myself, and this book really excites me.