1. 34
    1. 14

      An interesting counter-example: batch compilers rarely evolve to IDEs and are typically re-written. Examples:

      • Roslyn was a re-write of the original C# compiler
      • visual studio I think uses different compilers for building/intellisense for C++
      • Dart used to have separate compilers for Dart analyzer and Dart runtime
      • Rust is in a similar situation with rustc&rust-analyzer

      Counter examples:

      • clangd (c++) and Merlin (ocaml) are evolutions of batch compilers. My hypothesis is that for languages with forward declarations & header files you actually can more or less re-use batch compiler.

      Non-counter examples:

      • Kotlin and TypeScipt started IDE-first.

      If I try to generalize from this observation, I get the following. Large systems are organized according to a particular “core architecture” — an informal notion about the data that the system deals with, and specific flows and transformation of the data. This core architecture is reified by a big pile of code which gets written over a long time.

      You may find that the code works bad (bugs, adding a feature takes ages, some things seem impossible to do, etc) for two different reasons:

      • either the code is just bad
      • or the core architecture is wrong

      The first case is usually amenable to incremental refactoring (triage issues, add tests, loop { de-abstract, tease-apart, deduplicate }). The second case I think often necessitates a rewrite. The rewrite ideally should be able to re-use components between to systems, but, sadly, the nature of core architecture is that its assumptions permeate all components.

      For compiler, you typically start with “static world, compilation unit-at-a-time, dependencies are pre-compiled, output is a single artifact, primary metric is throughput” (zig is different a bit I believe ;) ), but for ide you want “dynamic, changing world, all CUs together, deps are analyzed on-demand, bits of output are queried on demand, primary metric is latency”.

      It does seem that “bad code” is a much more common for grief than “ill-fit core architecture” though.

      1. 4

        As I heard it, Clang started out because Apple’s Xcode team had reached the limits of being able to use GCC in an IDE, and they wanted a new C/C++ compiler that was more amenable to their needs. (They couldn’t have brought any of GCC’s code into the Xcode process because of GPL contagion.) So while Clang may run as a process, the code behind it (all that LLVM stuff) can be used in-process by an IDE for interactive use.

        1. 2

          What do you mean by “GPL contagion”?

          1. 1

            The GPL is a viral license.

            1. 1

              Oh wow, yikes. Thanks.

          2. 1

            That if they had linked or loaded any part of GCC into Xcode, the license would have infected their own code and they would have had to release Xcode under the GPL.

        2. 2

          It’s interesting that the clang tooling has evolved in a direction that would avoid these problems even if clang were GPL’d. The libclang interfaces are linked directly to clang’s AST and so when you link against libclang you pull in a load of clang and LLVM libraries and must comply with their license. In contrast, clangd is a separate process and talks via a well-documented interface to the IDE and so even if it were AGPLv3, it would have no impact on the license of XCode.

        3. [Comment removed by author]

      2. 3

        Thanks for the counter examples, those are really interesting!

        I’ve been able to make changes to the core architecture of my engine incrementally on a few occasions. Some examples:

        • I started out with a 2D renderer, but replaced it with a 3D renderer
          • (adding a dimensions sounds easy in theory, but the way 3d renderers are deisgned is very different from how 2d renderers are designed! lots of data structures had to change and tradeoffs had to be adjusted.)
        • I started out with hard coded character motion, but transitioned to a physics engine
        • I started out with a flat entity hierarchy, and transitioned to an tree structure
        • I started out with bespoke entities, and transitioned to an entity system

        These were definitely challenging transitions to make as my existing code base had a lot of hidden assumptions about things working the way they originally did, so to make it easier I broke the transitions into steps. I’m not sure I was 100% disciplined about this every time, but this was roughly my approach:

        1. Consider what I would have originally built if I was planning on eventually making this transition
        2. Transition my current implementation to that
        3. Make the smallest transition possible from that to something that just barely starts to satisfy the constraints of the new architecture I’m trying to adopt
        4. Actually polish the new thing/get it to the state I want it to be in

        It would be interesting to see retrospectives on projects where people concluded that this approach wasn’t possible or worthwhile and why. There could be more subtitles I’m not currently identifying that differentiate the above transitions from, e.g., what motivated Roslyn.

      3. 2

        This is super interesting, do you know of any materials on how to design an IDE first compiler?

        1. 3

          The canonical video is https://channel9.msdn.com/Blogs/Seth-Juarez/Anders-Hejlsberg-on-Modern-Compiler-Construction, but, IIRC, it doesn’t actually discuss how you’d do it.

          This post of mine (and a bunch of links in the end) is probably the best starting point to learn about overall architecture specifics:

          https://rust-analyzer.github.io/blog/2020/07/20/three-architectures-for-responsive-ide.html

      4. 1

        Did TypeScript start out “IDE-first”? I remember the tsc batch compiler being introduced at the same time as IDE support.

    2. 12

      I’ve found when I have the impulse to rewrite something, it’s usually some smaller subsystem that needs it, and that can be reworked without taking on the much less tractable whole codebase, and without blocking unrelated improvements. But it takes some sitting down planning and designing to figure out which bits are actually fine.

      I find it’s useful to go through the mental exercise of “how would I do this differently if I were to start from scratch?” and then separately think about “is there a more direct way to get from here to there?” Often there is, but it can be hard to see it until you know where you want to go (this is also where a lot of the value is in research systems, even if they are so incompatible with everything that they don’t really have any hope of direct adoption).

      The strangler fig pattern is also worth knowing about.

      1. 3

        I find it’s useful to go through the mental exercise of “how would I do this differently if I were to start from scratch?” and then separately think about “is there a more direct way to get from here to there?”

        I’ve found it useful occasionally to write a toy throwaway implementation which uses new concepts to try the viability of that approach. It doesn’t even need to deal with the full problem, it can be on a subset (ideally one that exposes the problems of the current approach), as long as it’s representative of the larger problem. Then later you can try and think how this can be integrated into the larger existing codebase. Not having to worry about the legacy code can really help free your mind to think “outside the box”, so to speak.

      2. 2

        I find it’s useful to go through the mental exercise of “how would I do this differently if I were to start from scratch?” and then separately think about “is there a more direct way to get from here to there?”

        I think this is really good advice, I wanted to include something along those lines but wasn’t able to put it to words as clearly as you did here!

        The strangler fig pattern is also worth knowing about.

        I just looked this up, it’s great to have a name for this pattern! I might do an edit where I add a section at the bottom with extra links and info from comments, if I do I’ll include a link to this pattern.

    3. 9

      This is a little off-topic, but about the original tweet:

      I wonder if the reason that “don’t plan, don’t abstract, don’t engineer for the future” is such good advice is that most people are already building on top of highly-abstracted and featureful platforms, which don’t need to be abstracted further? So 1. if you need some abstractions in the future, you’ll already have some decent ones even if you don’t make your own, and 2. if you do make your own abstractions, that would likely result in abstraction inversions.

      For example, contrast A: building your own system for high-performance parallel execution optimized for future usecases with B: just using a bash pipeline that supports what you need now. Most would advise doing B, at least at first; but B is only an option because you’re running on an already highly-abstracted platform with many existing features.

      1. 5

        I wonder if the reason that “don’t plan, don’t abstract, don’t engineer for the future” is such good advice is that most people are already building on top of highly-abstracted and featureful platforms

        I think the reason simpler: People are good at understanding concretions and what they have experienced, and bad at keeping many things in their head.

        A non-abstracted solution is a solution for a particular problem that is well-understood and usually maps directly to the conversations everyone at the company is already having. An abstracted one posits imaginary use-cases. This creates a doubly whammy: you don’t have concrete experience with those use-cases, and now, to understand the motivation of the code’s design, you must keep extra use-cases in your head.

      2. 1

        I definitely think this is one of the reasons that the original tweet is good advice. It’s tempting to conclude that an existing API is bad, for example, and that you’re going to improve it by wrapping it–but this is actually quite the feat to pull off IMO!

        I’ve definitely been guilty of this, and time and time again I find out that even if the API had problems, I did not correctly identify the constraints up front that the folks designing the API had already become aware of. This either results in the wrapper taking a long time to design, it being of low quality/usefulness, or both.

        That’s not to say that it can’t or shouldn’t be done–just that you should know you’re taking on a serious project when you decide to do this, and you probably want to have some real world experience with the API first.

        Another reason the tweet is good advice is that engineering for the future necessitates predicting the future, and that’s a very hard thing to do. When you’ve just started working on a problem is when you know the least about it, so you’re in the worst possible position to make long term plans.

    4. 6

      This misses one huge motivator: you want to move the implementation to a new underlying technology that is fundamentally unable to run your existing code. Most often this is the language: we can’t find COBOL programmers so we need to rewrite this big thing in Java or C# fast, before all the original devs retire. Sometimes it is the platform. Sometimes it used to be a desktop app and now it needs to be a web app.

      Of course, this motivation doesn’t cause the rewrite to go any better than a rewrite for any other reason, except that in some cases, you are better off with a semi-broken rewritten system you can maintain than with a working system nobody can touch.

      1. 15

        On the other hand, with the Zig compiler I have figured out how to incrementally bootstrap the compiler, slowly moving the implementation from C++ to Zig, all the while keeping the code shippable. If you download Zig today, what you get is a hybrid compiler: part of it is implemented in C++, part of it is implemented in Zig. Over time we’ve been able to reduce the amount of C++ to fewer and fewer lines, until some day (soon!) it can disappear completely.

        More details / examples:

        1. 3

          Andy already knows this, but for everyone else’s benefit–the screenshotted audio message at the top of the original post is Andy explaining the above to me, and audio response from me was basically my draft for this post.

          I already held the opinion that rewriting is usually the wrong approach, seeing Andy productively pull of an incremental transition for what at first glance seems like one of the most difficult things to do incrementally further convinced me.

      2. 2

        IME moving from one platform to another is much less risky than other kinds of rewrites when you do a 1:1 copy of the existing design.

        Think “port” not “rewrite”.

        If you leave yourself no decisions to make, you can write the new version at a substantial fraction of the rate at which you can physically type.

    5. 3

      I think the core of the refactor/rewrite conversation is how long you’re comfortable working on a branch without getting end-user feedback.

      I’ve only been working as a software engineer for about 10 years, but I’ve always known long-running branches as an anti-pattern that teams try to avoid. Instead, my teams always aimed to merge our code as soon as we have some problem solved — usually multiple times each day.

      (When I say “long-running branches” I mean both unmerged Git branches and code in the default branch that isn’t getting run in production. TODO: Find a better phrase to capture this.)

      On larger projects there are sometimes components that will take days or weeks to complete, so we try to break that work down into smaller pieces that can be worked on individually. Some things are hard to break down, so we make the trade-off to work in a big branch, but this is generally a last resort and a code/culture smell.

      I think of the decision to refactor or rewrite the same way:

      • If the system is small enough then it doesn’t matter because you aren’t working in a long-running branch.
      • Otherwise, try to refactor/rewrite one subsystem at a time in a way that users (and other engineers) can give you feedback. At every step you should have a working system, and if you break something then you find out immediately instead of when you launch (i.e. months later after you’ve forgotten how the broken thing was supposed to work).
    6. 3

      I’ve done a fair bit of both kinds of updating.

      Sometimes you have to rewrite from scratch because you have to swap out infrastructure, like the programming language. I threw out one large Objective-C codebase and went to C++ because we needed it to be cross-platform.

      Sometimes you can’t get past certain bottlenecks in performance without a change in architecture. In that same rewrite I used the knowledge that parsing JSON into an object tree was a serious performance problem, and made a new encoding and library that let us work with the encoded data in place without parsing or allocation.

      But more often you can make progress by successive refactoring, to the point where none of the original code is left. That’s how I tend to work. I start simple, and when something is too limited or becomes too ugly I fix it, even if that means sweeping changes in dozens of source files. The refactoring tools in modern IDEs, plus plain old search and replace, make this work, as does the type-checker. This is one reason I can’t write major code in dynamic languages — I rely on the compiler to help me with refactoring.

      1. 1

        I threw out one large Objective-C codebase and went to C++ because we needed it to be cross-platform.

        As the maintainer of a cross-platform Objective-C runtime, this makes me sad.

        1. 1

          I tried working with GNUstep early on. It was missing a bunch of Foundation APIs and others were buggy. I did remedy some of that and submitted patches, but then gave up. (Also, did GNUstep run on Windows? I can’t recall.)

          1. 3

            I tried working with GNUstep early on. It was missing a bunch of Foundation APIs and others were buggy. I did remedy some of that and submitted patches, but then gave up

            Unfortunately, I think the project was killed by the ‘GNU’ in the name: as a GNU project, they felt the need to make it possible to build with GCC. GCC’s Objective-C support now just about supports declared properties but it doesn’t support ARC, the non-fragile ABI, or anything Apple introduced after about 2005. This really hampers GNUstep’s development because we can’t use any of the newer features that make life easier for developers and generate better code (let alone things like ObjC++: using std::unordered_map instead of GSIMap would massively simplify a load of the code) and we end up with awful macros that look a bit like new language features to say ‘use the new language features when they exist and fall back to something else for GCC’.

            Also, did GNUstep run on Windows? I can’t recall.

            It did and still does. My runtime is also used by WinObjC and (when clang is targeting an MSVC triple) even supports exception interop with MSVC-compiled C++ code.

            All of that said, I don’t disagree with choosing C++ instead of Objective-C for a new project. I’d still pick C++17 over Objective-C today, I’m just sad that the lack of cross-platform support was your reason for doing so. GNUstep / WinObjC / libFoundation’s Foundation implementations all provide a lot more as a portable baseline than the C++ standard library.

      2. 1

        Sometimes you can’t get past certain bottlenecks in performance without a change in architecture. In that same rewrite I used the knowledge that parsing JSON into an object tree was a serious performance problem, and made a new encoding and library that let us work with the encoded data in place without parsing or allocation.

        I know this was just an example and not your main point, but that sounds like a neat library!

        There are places in my game where I deserialize an entire structure from disk just to read a single value from it, none of the places I do this are performance sensitive so it’s not worth it to make things more complicated, but it’s definitely interesting to think about how I would do this if I needed to do it in a performance sensitive context.

        This is one reason I can’t write major code in dynamic languages — I rely on the compiler to help me with refactoring.

        I definitely feel that, I’m a huge fan of “follow the red brick road” style refactoring. I feel a little lost without it! While I don’t tend to worry about future proofing in general, I do try to make intentional decisions about what will and won’t result in a compiler error up front, to make my life easier later on, erring on the side of “that should result in an error” if I’m unsure.

        e.g. unless there’s an obviously correct default behavior, I’ll make match statements (Rust’s switch statements) exhaustive so that if I ever add an enum variant the compiler will remind me to update the match statement as well.

        1. 2

          I know this was just an example and not your main point, but that sounds like a neat library!

          You may also be interested in:

          1. 1

            Thanks for the links!!

        2. 2

          The efficient serialization library is called Fleece: https://github.com/couchbaselabs/fleece

          1. 1

            Oh neat, thanks!!

    7. 2

      For “big rewrites” I think it makes sense to switch from an “agile” “get it running” to a “plan more in detail” “waterfall” style approach.

      The point is to avoid any “oh that’s why I originally did it this way … I had to”. I think knowing whether that makes sense as early as possible is the major thing you get from experienced developers. There is no golden rule to make that decision. The most common factor I have seen is that one should have a very deep understanding of the problem and that’s of course not just about solving it theoretically, but also practically (performance and other factors play a role here).

      Something that also helps here is to be able to explain to another person what and how the rewrite would go. This makes it easy to catch such problems.

      But what’s most important: Don’t trust any “never rewrite” and “always rewrite” dogmas.

      Also, while the “small encapsulated parts” advice is generally good, that might not be an option in a certain context and to me it feels like if it would pan out that way, you don’t actually ask that question in first place (or it’s easy to answer).

      One more learning: Try to test your assumptions! Like mentioned in the article everything is well until a certain point. If the rewrite for whatever reason has to be big, it can make sense to still test parts by writing little prototypes. That can help with making the theoretical picture more clear. Of course it only being a prototype can mean you still miss something critical, but it lowers the likelihood.

      Prototyping and writing code to later throw away is something that I’ve seen working really and that I think is done too rarely. Often diagrams and big meetings are used instead, but these can be too abstract, especially when more people are involved and not everyone has a good understanding on the details.

    8. [Comment from banned user removed]

    9. 1

      Always and all the time. If it’s not broken, fix it.

    10. 1

      This is pretty old advice. See also this and this. These are 20 year old posts by Joel Spolsky and they still hold value.

    11. 1

      Fant Model… patch it until it utterly collapses under its own weight. I was not fond of this model, but I feel reality has encroached upon what would be a perfectly good time dreaming about how I will make the next iteration more efficient and better.

      That doesn’t mean you shouldn’t design things to be replaced and have well modeled APIs. Once upon a time I was witness to a complete engine rewrite piecemeal over a staggering period of time. Well, really, the rewrite never ended once the api gateway was silently proxied out by api and other conditionals. That fate also has potential issues as calls zig and zag through the api gateway.

      1. 3

        What’s the Fant Model? I tried looking it up but couldn’t find anything relevant.