I like this but my initial reaction is that we are still way more accepting of complexity than we should be. Adding complexity is expedient but it costs so much.
My hope for the industry is that we thoroughly internalize the lessons of Erlang and build systems along those lines.
Systems where:
all design admits the possibility of failure,
messaging is the base reality,
systems use a single language and in-house developed tools rather than an untameable menagerie of languages and vendor tooling.
Complexity is often not negotiable. It is sometimes accidental (based on budget constraints and so on) but very often purely necessary (coping with failure cases, and exceptional conditions inherent to the problem domain).
You cannot escape complexity. You can reduce some of it but not all of it. Any of the three points you mention have a ton of caveats and context-specific exceptions that, unless you know ahead of time, will be replicated in your own systems.
This is, in part, point of the post. The problem is not accepting or fighting with complexity – even though we should do as much as we can to limit it – it’s that it will be unescapable and the problem will be coping with it, not just preventing it.
Erlang’s approach, since you mentioned it, is tuned to that need of coping vs. preventing, but it’s still rather limited in scope to transitive faults that can be fixed with retries, and in allowing better fault isolation; it will not cover requirement misunderstandings, nor problematic communication patterns in teams resulting in widespread production outages. People can do that, and developing tools for people is how you can help.
Once again here Erlang provides a lot of fantastic tools (DTrace/Systemtap probes, internal tracing of all Erlang functions, automated logging and good frameworks, and so on) – I would say it fits in the “right tool for the job” when it comes to operability, but the language/framework/app level stack is only a restricted part of it, not the full picture. Resilience has to see broader than code.
Agreed. I think I would be happy if people just did a thorough soul search: “do we really need that?” before each piece of tech enters the stack. The side effects of even the ones that are introduced to help with others are significant.
You cannot escape complexity. You can reduce some of it but not all of it. Any of the three points you mention have a ton of caveats and context-specific exceptions that, unless you know ahead of time, will be replicated in your own systems.
I was thinking that, adding multi-region setup + replicated database and stuff, you’re actually making reasonning about the system as a whole more complex but also making deployment, operation, monitoring and others more complex too. Maybe sometime it’s better to keep the simple system because you expect the errors that will arise from it less impacting than all the errors that you may introduce by making the system more complex.
To give an example where observability/the ability to reason about something fails: I have (most) ORMs with a passion for that. A bad abstraction layer has serious downsides that aren’t immediately apparent until people would like to use it in critical environments.
Anyone who’s interested in this would also probably like to read Out of the Tar Pit.
One of the things I really like about Out of the Tar Pit is that they distinguish between the essential complexity of the problem you’re trying to solve and the accidental complexity involved in expressing your problem domain to computers. It’s one thing to rail against complexity, but I think generally when people are doing that they’re actually fighting against accidental complexity.
I think generally when people are doing that they’re actually fighting against accidental complexity.
I’d like to think so, but I’ve read a lot of rants against the complexity of the HTTPS certificate ecosystem, or the complexity of OpenGL and Vulkan, or even the complexity of dynamic linking. They all pretty much boil down to “this system is more complex than it needs to be to support my specific use-case, therefore it’s badly designed”.
One of the things I really like about Out of the Tar Pit is that they distinguish between the essential complexity of
the problem you’re trying to solve and the accidental complexity involved in expressing your problem domain to
computers.
In practice the situation is more complex though. There’s also accidental complexity due to not solving the correct problem. This could be because you misidentified it, because of communication issues between you and the client, because the client himself misunderstands the fundamental problem that they need solved, because you are solving a larger problem than needs to be solved (what @mfeathers is referring to in the parallel thread), because constraints and trade offs caused you to choose suboptimal solutions that end up more complex when the inevitable changes and additions are requested, etc. At various points on the timeline things that seemed to be essential complexity may turn out to be accidental complexity. And vice versa, when an overdesigned solution miraculously turns out to afford the change or addition that is requested.
I like the example of the Maxwell equations equations as a thing that seems necessarily complex, but actually contains a lot of accidental complexity for many practical problems, because many of the terms are negligible and a much simpler equation can be used in those situations. If you apply Maxwell’s equations where Ohm’s law suffices, you are introducing accidental complexity)
It’s one thing to rail against complexity, but I think generally when people are doing that they’re actually fighting
against accidental complexity.
Often they are, but it is exactly unqualified usage of the word ‘complexity’ that leads to the situation where people start fighting any complexity, even complexity strictly necessary to solve the problem. A code comment of the sort ‘This code is complex because of …’ triggers developers to argue that it should be changed to be simpler, even if the comment explains exactly that it is complex because we have to conform to complexity in the problem domain. It’s something like the fundamental attribution error applied to a word, where the word becomes a reference to something necessarily bad, instead of the badness being part of the thing it refers to and thus possibly absent.
I like this but my initial reaction is that we are still way more accepting of complexity than we should be. Adding complexity is expedient but it costs so much.
My hope for the industry is that we thoroughly internalize the lessons of Erlang and build systems along those lines. Systems where:
Complexity is often not negotiable. It is sometimes accidental (based on budget constraints and so on) but very often purely necessary (coping with failure cases, and exceptional conditions inherent to the problem domain).
You cannot escape complexity. You can reduce some of it but not all of it. Any of the three points you mention have a ton of caveats and context-specific exceptions that, unless you know ahead of time, will be replicated in your own systems.
This is, in part, point of the post. The problem is not accepting or fighting with complexity – even though we should do as much as we can to limit it – it’s that it will be unescapable and the problem will be coping with it, not just preventing it.
Erlang’s approach, since you mentioned it, is tuned to that need of coping vs. preventing, but it’s still rather limited in scope to transitive faults that can be fixed with retries, and in allowing better fault isolation; it will not cover requirement misunderstandings, nor problematic communication patterns in teams resulting in widespread production outages. People can do that, and developing tools for people is how you can help.
Once again here Erlang provides a lot of fantastic tools (DTrace/Systemtap probes, internal tracing of all Erlang functions, automated logging and good frameworks, and so on) – I would say it fits in the “right tool for the job” when it comes to operability, but the language/framework/app level stack is only a restricted part of it, not the full picture. Resilience has to see broader than code.
Agreed. I think I would be happy if people just did a thorough soul search: “do we really need that?” before each piece of tech enters the stack. The side effects of even the ones that are introduced to help with others are significant.
I was thinking that, adding multi-region setup + replicated database and stuff, you’re actually making reasonning about the system as a whole more complex but also making deployment, operation, monitoring and others more complex too. Maybe sometime it’s better to keep the simple system because you expect the errors that will arise from it less impacting than all the errors that you may introduce by making the system more complex.
Erlang and Smalltalk are pretty high bars to clear.
Very thoughtful and insightful article.
To give an example where observability/the ability to reason about something fails: I have (most) ORMs with a passion for that. A bad abstraction layer has serious downsides that aren’t immediately apparent until people would like to use it in critical environments.
Anyone who’s interested in this would also probably like to read Out of the Tar Pit.
One of the things I really like about Out of the Tar Pit is that they distinguish between the essential complexity of the problem you’re trying to solve and the accidental complexity involved in expressing your problem domain to computers. It’s one thing to rail against complexity, but I think generally when people are doing that they’re actually fighting against accidental complexity.
I’d like to think so, but I’ve read a lot of rants against the complexity of the HTTPS certificate ecosystem, or the complexity of OpenGL and Vulkan, or even the complexity of dynamic linking. They all pretty much boil down to “this system is more complex than it needs to be to support my specific use-case, therefore it’s badly designed”.
In practice the situation is more complex though. There’s also accidental complexity due to not solving the correct problem. This could be because you misidentified it, because of communication issues between you and the client, because the client himself misunderstands the fundamental problem that they need solved, because you are solving a larger problem than needs to be solved (what @mfeathers is referring to in the parallel thread), because constraints and trade offs caused you to choose suboptimal solutions that end up more complex when the inevitable changes and additions are requested, etc. At various points on the timeline things that seemed to be essential complexity may turn out to be accidental complexity. And vice versa, when an overdesigned solution miraculously turns out to afford the change or addition that is requested.
I like the example of the Maxwell equations equations as a thing that seems necessarily complex, but actually contains a lot of accidental complexity for many practical problems, because many of the terms are negligible and a much simpler equation can be used in those situations. If you apply Maxwell’s equations where Ohm’s law suffices, you are introducing accidental complexity)
Often they are, but it is exactly unqualified usage of the word ‘complexity’ that leads to the situation where people start fighting any complexity, even complexity strictly necessary to solve the problem. A code comment of the sort ‘This code is complex because of …’ triggers developers to argue that it should be changed to be simpler, even if the comment explains exactly that it is complex because we have to conform to complexity in the problem domain. It’s something like the fundamental attribution error applied to a word, where the word becomes a reference to something necessarily bad, instead of the badness being part of the thing it refers to and thus possibly absent.