1. 27
  1. 11

    This is one of those cultural issues around software that, unfortunately, dwarfs technical factors in terms of how resources are spent, how much emotional energy must be burned up in order to convince people to do the right thing, and what tools and platforms get chosen. To a programmer, “Let it crash” seems harmless. We crash systems all the time just to see what happens. This goes back to the mathematicians, even. Someone wondered, “what happens if we take sqrt(-1)?” In the 16th century, a group of Italian mathematicians “crashed” the shit out of mathematics and it came back stronger, with the advent of complex numbers.

    Unfortunately, “Let It Crash” creates the sense held by business people that we’re all a bunch of underinvested “cowboys” who can’t be bothered to build systems right in the first place. (Hence, they reduce engineer autonomy and respect and time budgets, to the point where building systems right becomes impossible. But that’s another story.) To us, “Let It Crash” means, “Ah, so apparently we can build systems that survive node failures” (because we also know that node failures are inevitable at scale). That’s not what those business types hear, though.

    I would replace “Let It Crash”, in those pitches, with “Survive Chaos”. Business people understand chaos (hell, they create so much of it) and recognize the need for a system that is robust in the face of unexpected, chaotic intrusions.

    1. 8

      Why on earth would any engineer be talking to a business person using terminology like “let it crash”. The business person doesn’t care. They care about uptime. They care about how much business is driven by the product. “Let it crash” is an implementation detail. Going into depth on an implementation detail with a business person is an exercise in frustration for both sides. The business person will need too much context to even understand what you mean. In fact attempting to help them understand isn’t very respectful of their time in the first place. This isn’t cultural it’s a question of how responsibilities should be divided up and expecting a business person to have to come up to speed on technical jargon is a bad division of responsibilities.

    2. 6

      I’ve never used Erlang. How long does it take to recover from a crash, and e.g. start a new thread? I’m guessing this is cheap?

      I ask because Node’s adopted the official “let it crash” line, but restarting a process took up to a minute when I was running it in production.

      1. 11

        It is much cheaper than spawning an OS thread but more expensive than just calling a function.

        1. 4

          Erlang process spawning is, iirc, less than a thousand machine cycles in modern hardware.

          1. 1

            That’s a shame. It used to be a goal of the Node project to hold startup time to 30ms.

          2. 4

            I don’t think “let it crash” is a tagline, it’s a mantra for people already using Erlang.

            Reasons to use Erlang/BEAM:

            • Fault Tolerant

            • Battle-tested over 30 years

            • Concurrency and scalability as primary design principles

            • Upgradeable without downtime

            I feel like the author is conflating “things people say about Erlang” with “ways people sell Erlang.” Nobody is selling Erlang with “Let’s use Erlang and let it crash!” because that doesn’t even make sense. You say “Let’s use Erlang, it’s fault tolerant!” and then to your developers you explain the philosophy of let it crash or just give them a copy of LYSE.

            1. 4

              As with most engineering policies, the devil is in the details. “Let it crash” … Let what crash? Let it crash when? A better phrase to begin with might be “graceful degredation” – under what circumstances might something degrade? What are reasonable ways to prevent, recover, or terminate from those various degredations? Then among those reasonable alternatives, which seem the most cost-effective?

              1. 3

                Let it crash is a punchy tagline, but it’s inside baseball; it probably doesn’t make sense to anyone outside of OTP-land. It provokes, though, so maybe it’s effective(?)

                1. 1

                  I don’t know what “inside baseball” means, but the “Let it crash” tagline is also used for the Akka actor library for Java & Scala, and http://letitcrash.com/ is the Akka Tech Team blog.

                  1. 2

                    “inside baseball” means that, for a given discussion, so much insider knowledge & jargon is used, that the discussion would be incomprehensible to anyone without that insider knowledge.

                    obligatory wikipedia link: https://en.wikipedia.org/wiki/Inside_baseball_(metaphor)

                2. 2

                  Personally, my reactions are exactly the opposite of what the author described: “auto healing mechanisms” sounds enough like marketing chaff to be suspicious, while “let it crash” is nice and clear. Know your audience, I guess.

                  1. 1

                    The term “fail fast” is also frequently used to mean the same thing. I’m not sure if the baggage the english terms carry are necessarily better.

                    As a development paradigm it does carry a lot of advantages.