Software correctness is not a developer decision, it’s largely a business decision guided by cost management. I mean depending on where you work and what you work on the software may be so stable that when you try to point out a problem the business will simply point out that the software is correct because it’s always correct and that you’re probably just not understanding why it is correct. Apps are buggy mostly when the costs of failure to the business are low or not felt by management.
There is no barrier to entry or minimum bar for consideration in software.
So you end up with thousands of businesses saying variations of “our budget is $1000 and we want you to make a software that …”.
Then of course you are going to see lots of failure in the resulting software.
The choice often ends up being “spend 10,000x and make it super robust” or “live with bugs”.
No business chooses the first option when you can say “oops sorry that was a bug we just fixed it. thank you! :)”.
This pattern persists even as the cost of developing software comes down. Meaning if you reduce the cost of producing flawless software to $X the market will choose a much more buggy version that costs a fraction of $X because the cost of living with those bugs is still much lower than the cost of choosing a flawless one.
I recently moved to financial software development, and it seems everybody has real life experience of losing huge sums of money to a bug, and everybody, including management and trading, is willing to try practice to reduce bugs. So I became more convinced that it is the cost of bugs that matters.
While this is true, don’t you think this is sort of… pathetic? Pretty harsh, I couldn’t come up with a better word on the spot. What I mean is, this is basically “those damn suits made us do it”.
Would you like your mobile phone screen to be made bullet proof and have it cost $150M?
Would you like an atomic bedside alarm clock for $500k?
A light bulb that is guaranteed to not fail for 200 years for $1,000?
It’s a real trade-off and there’s a line to be drawn about how good/robust/reliable/correct/secure you want something to be.
Most people/businesses can live with software with bugs and the cost of aiming for no bugs goes up real fast.
Taking serious steps towards improving software quality is very time consuming and expensive so even those basic first steps wont be taken unless it’s for something critical such as aircraft or rocket code.
For non-critical software often there’s no huge difference between 0 bugs or 5 bugs or 20 bugs. So there isn’t a strong incentive to try so hard to reduce the bugs from their initial 100 to 10 (and to keep it there).
The case that compels us to eliminate bugs is where it is something to the effect of “no bugs or the rocket crashes”.
Also you have to consider velocity of change/iteration in that software. You can spend tons of resources and have your little web app audited and certified ast it is today but you have to think of something for your future changes and additions too.
As the technology improves the average software should become better in the same way that the average pair of shoes or the average watch or the average tshirt becomes better.
Would you like your mobile phone screen to be made bullet proof and have it cost $150M?
Quite exaggerated, but I get your point. The thing is — yes, I personally would like to pay 2-3x for a phone if I can be SURE it won’t degrade software-wise. I’m not worried about hardware (as long as the battery is replaceable), but I know that in 2-3 major OS updates it will feel unnecessarily slow and clunky.
Also you have to consider velocity of change/iteration in that software
Oh, man, that’s whole other story… I can’t remember the last time I wanted software to update. And the only two reasons I do update usually are:
It annoys me until I do;
It will hopefully fix some bugs introduced due to this whole crazy update schedule in the first place.
Most people/businesses can live with software with bugs and the cost of aiming for no bugs goes up real fast.
Which brings us back to my original point: we got used to it and we don’t create any significant pressure.
Businesses that allow buggy code to ship should probably be shamed into better behavior. They exist because the bar is low, and would cease to exist with a higher bar. Driving them out of business would be generally desirable.
A boycott would need to start or be organized by developers, since developers are the only people who know the difference between a circumstance where a high-quality solution is possible but difficult, a circumstance where a high-quality solution is trivial but rare for historical reasons, and a situation where all solutions are necessarily going to run up against real, mathematical restrictions.
(Also, most code in existence isn’t being developed in a capitalist-corporate context, and the most important code – code used by everybody – isn’t being developed in that context either. We can and should expect high quality from it, because there’s no point at which improving quality becomes “more than my job’s worth”.)
it’s largely a business decision guided by cost management.
I don’t agree about the cost management reasoning. Rather it is a business decision that follows what customers actually want. And customers actually do prefer features over quality. No matter how much it hurts our pride in craftsmanship…
The reason we didn’t see it before software is that other fields simply don’t have this trade off as an option: buildings and cars can’t constantly grow new physical features.
You can add on features to cars, and buildings, and the development process does sometimes go on and on forever. The difference is if your cow clicker game has a game breaking bug, typically nobody literally dies. There exists software where people do die if there are serious bugs and in those scenarios they either compromise in speed or cost.
We’ve seen this before software in other fields, and they do have this trade off as an option, you just weren’t in charge of building it. The iron triangle predates software though I do agree scope creep is a bigger problem in software it is also present in other industries.
I agree. I suppose this is another thing that we should make clear to the general public.
But the problem I’m mostly focusing on is the problem of huge accidental complexity. It’s not business or management who made us build seemingly infinite layers and abstractions.
It’s not business or management who made us build seemingly infinite layers and abstractions.
Oh it definitely was. The waterfall process, banking on IBM/COBOL/RPG, CORBA, endless piles of objects everywhere, big company apps using obfuscated formats/protocols, Java/.NET… these were middle managers and consultants forcing bullshit on developers. Those bandwagons are still going strong. Most developers stuck on them move slower as a result. The management solution is more bullshit that looked good in a PowerPoint or sounded convincing in a strip club with costs covered by a salesperson. The developers had hardly any say in it at all.
With that status quo, we typically are forced to go with two options: build the new thing on top of or within their pile of bullshit; find new niches or application areas that let us clean slate stuff. Then, we have to sell them on these whether internally or externally. Doing that for stuff that’s quality-focused rather than feature/buzzword-focused is always an uphill battle. So, quality-focused software with simple UI’s aren’t the norm. Although developers and suppliers cause problems, vast majority of status quo is from demand side of consumers and businesses.
It isn’t? Most managers I’ve met come and see me saying, we dont want to have to think about this, so build on top of this abstraction of it. They definitely do not want us wiping the slate clean and spending a lot of time rebuilding it anew, that would be bad for business.
That battle is already lost. I think we should instead show them what can be done by creating good alternatives to most of the buggy systems they use. Just one, full-featured, high-quality product after another. The complexity of what the market demands will make failure inevitable for some of the components. In those cases, we can show them how much better one can contain and/or recover from failures. Maybe even show fuzzing results of these pieces of software vs the norm to put some numbers on the difference. Also, prompt updates after problems are found. Eventually, we’ll have a huge pile of software that works better than the other stuff.
With that in place, we can then do write-ups arguing that people should demand better stuff like (cites many apps/systems) from suppliers. We can convince people to switch, convince companies or FOSS teams to use similar methods, talk about regulations, and so on. It helps to have a lot of good examples, though. On my part, I’ve been using case studies of both lightweight QA and high-assurance systems to show producers can build stuff way better than the norm. With lightweight methods, it sometimes reduces debugging so much that it saves money and time. Otherwise, just cost a little extra to save users a lot of headaches. If they can use and afford it, at that point the only reason they would be using inferior methods is that they don’t care. At least, that’s what we tell and show their customers. ;)
I think we should instead show them what can be done by creating good alternatives to most of the buggy systems they use.
This feels right. Maybe we need some sort of movement, a manifest similar to agile, for developers to unite under.
The complexity of what the market demands will make failure inevitable for some of the components
This feels like catch-22: if we don’t make the product quickly, then someone else will, and it won’t be very robust; if we make the product quicker than the competition, then ours won’t be very robust. So either way, the first to the market is often the winner of the market (at least for some period), and due to the development speed requirements it’ll be buggy.
I upvoted this because I think it’s a good reflection, but I think the author is dead-wrong.
The author keeps saying “amateur” (one who does a thing for the love of that thing) instead of “novice” (one who does a thing while at a low skill level in that thing)–and most of the software we have is written and built by novices, like it or not.
More importantly, I think the author misses the truth of the evolution of software: we’ve shied away from software that exposes users to an ugly if accurate view of their data and processes and instead towards a sort of enfeebled shiny existence. We have then celebrated this as progress, and loudly convinced them that they shouldn’t have to be tool-users in order to use tools.
We’ve similarly pushed away (and for good reason!) from building simple appliances–good for our users, because it helps them adapt their tools to their problems (instead of adapting themselves or learning not to have problems), and good for us, because there’s always money in selling people a new solution to fix a problem they didn’t know they had.
I was about to post this. I also wanted to add that it is not solely about bad software implementation. The trend of making things shiny and abstracting users away from the reality of digital processing also results in software packages that try to do things that should not really be attempted in a consumer market.
As an extreme example, digital assistants. The full perfect implementation of a digital assistant is a an AI complete problem, and there is no way it can perform the job a naive user expects (or the advertising implies). These kinds of applications can only ever succeed in training users that software is an opaque black box that arbitrarily works or doesn’t and there is nothing anyone can do to solve this. We should be educating people on what software can do and what it can’t in a systematic way so people can make the best use of it.
In short: when software was written for ‘computer users’, not just ‘people’, it was harder to use and required reading manuals, but at least it was honest and you’d see what you get. Today the “enfeebled shiny existence” is less honest and naturally less stable, due to all the magic required to maintain the facade.
Also, ‘novice’ implies the temporary nature, as in ‘novice will become a master’. I wouldn’t call, for example, core Apple developers ‘novices’, yet their products are incredibly buggy and low quality lately.
I’ve been reading a lot of Nancy Leveson’s work and she provided had an amazing explanation for why software engineering is so different from “conventional” engineering. In f.ex mechanical engineering, the main danger is component failure: Something breaks and cascades through the machine. In software engineering, the main danger is emergence: the combination of multiple interacting pieces, all working perfectly, leads to a global problem.
It’s not a “we’re more incompetent than the REAL software engineers”. She studied the designers of missile systems, power plants, and aircraft, all places that take software engineering extremely seriously. But they’re all still buggy for emergence reasons.
That’s interesting. A quick search indicates that the F-35, which has had numerous delays and reliability issues (I read somewhere that pilots have to reboot one of the onboard computers every 10 minutes or so) has over 8 million lines of code.
It’s true. I don’t think it counters the point, though. How many of those systems are designed with integration patterns or analyses that ensure the individual components work together properly? I doubt many. The few I’ve seen came out of the correct-by-construction approaches. Even they usually have simplified mechanisms for the integration that make it easier to analyze the system. Many real-world systems use unnecessarily, complicated forms of integration from how they couple modules up to the build systems they use.
I think emergence will have a mix of intrinsic and accidental complexity as usual. I think many failures are caused by accidental, though.
Every time one of my users inadvertently reports a bug by describing a problem followed by asking “is this normal?” I feel shame for our entire profession.
Tell me about it… I got into a heated debate with one of my coworkers. They were claiming our product worked “because users don’t complain about it.” Our tests actually show that our product perform badly (in terms of correctness) and our Sentry is full of errors.
It is a well known fact that, when using computers, users just always blame themselves when the software is buggy: “Oh I used it wrong”, “Oh I did something wrong.” I see that everyday with my 70 year old parents when they use any shitty web application.
I tried to think about what software I’ve used that is buggy and I realised it’s almost entirely websites and mobile apps. The actual applications I run on my Mac are pretty damn solid. Chrome, Sublime, WebStorm and PyCharm, Lightroom, Spotify even… They all work well. I actually couldn’t think of the last bug I ran into in any of them. There clearly are a bunch of engineers out there who care about the correctness and stability of their software and put a lot of work into it. But we don’t really hear from them too much (or, they’re buried under the endless self-promotion of the JS Framework Shootout crowd).
I work on a number of very buggy web applications right now. They’re built on (IMO) very poor decisions made by people who didn’t really know what they were doing, but who got a bunch of funding and built something anyway. In some ways that’s an impressive achievement (I’ve never built a successful company) but it’s also embarrassing. And my whole company seems to have internalised the idea that software fails. So production bugs are just normal things, as are weekly hot fixes to critical issues. It’s definitely a problem. We’re not doing anything to address the deep-rooted issues or the mistakes of the early days. No-one even mentions it. I don’t think anyone would dare to think that big.
I work on a free software webapp project, pump.io. We try really hard not to bloat it with features and still I feel like I’m just flailing about in the dark with no real idea what I’m doing.
It’s written in JavaScript. I like JavaScript a lot, but I also wonder if that’s a big part of why I feel I have no control over the system. Maybe TypeScript will help with this.
There are an incredible amount of XXX and TODO comments scattered around. Lots were there when I took over maintenance, and lots were written by me. I wonder a lot if I’ll ever actually get to go back and fix them. I hope the answer is yes but realistically I’m guessing the answer is no because (external) things change too rapidly and I’ll never have enough stability to go do low-priority polish like that. It’s interesting and kinda sad to me that adding XXX comments to the code makes me feel better. It’s like I’d be in the exact same mess without them, but at least I feel like I have some grasp on the mess.
I don’t know. This article, and this comment bummed me out. They’re kinda spot on. The feeling I have now reminds me of Be Nice To Programmers. (I will note, though, that it is 2 AM and I’m sleep deprived.)
Grass is always greener on the other side. For any other industry, we can find good and bad examples. Nuclear powerplants were considered rocksolid in Japan and now see how people see it. Delta rockets were always seen as stellar but now see how people feel about Falcon vs Delta rockets. 70s/80s were not any better or worse than today, if 80s industry was so good and fair, we would not see any movement like GNU.
It is all business.
My rule of thumb for engineering is that engineering is the transfer point of scientific knowledge to business. An engineer should understand both sides sufficiently well and act accordingly. Usually what I see is, though, that engineers fall in love with science and hate business.
We know how to produce software that doesn’t fail. It’s not particularly easy or fast or cheap but we can do it.
The practical reality is software that “doesn’t fail” is rarely genuinely demanded or needed. So the market doesn’t ask or pay for it.
Even when businesses demonstrate an initial interest in the idea they will immediately back down as soon as they face the reality of various costs of producing such a thing. Their expression of interest is mostly just a big wish.
If someone’s paying $10k for let’s say a custom Wordpress plugin to be made in 2 weeks, would they be interested in a much more secure and much less buggy version that costs $10M that is made in 2 years? No. They don’t really want it and they don’t really need it.
This is true, but it’s also very true that many of the costs of shit software are externalized (THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT), and so no, that $10k spent doesn’t actually cover the total cost of the software to the buyer.
Yet modern software, being complex, does fail from time to time. As do all those things engineers work on. The engineer’s solution to that is not trying to build complex things that never fail; it’s attacking the problem on all fronts. Decreasing defect rate is part of the solution, but so is actually measuring the frequency and impact of defects as well as thinking about failsafes.
Sometimes doctors give medicine to patients and it fails, because of a bad diagnostic or an unplanned adverse reaction, but mostly because biology is complex. That’s why patients are monitored in the hospital.
Sometimes accidents happen in nuclear power plants, because physics is complex and components can have defects. Most of them are not critical because engineers have planned for the unplanned.
Sometimes trains don’t start, drivers get sick or go on strike, trees fall on rails… Yet the whole country does not end up being paralyzed for it.
It’s funny that the author uses the automotive industry for comparison, when you think about it. Quick, what’s the first word that comes to your mind when you hear “breakdown”? (Maybe it doesn’t work as well in English because it could be something like “nervous”, but that’s not the case in my native language…) What’s the first cause of accidental death worldwide that isn’t health-related?
I had an education as an engineer in networks and electronics; we had courses in “resilience” that dealt with things like redundancy, MTBF / MTTR, monitoring… as well as the impact of component complexity on failure rates. A popular approach in those fields is to use cheap, relatively simple components, assume they will fail, and then make sure the failure of a component is 1) not critical ; 2) easy to detect and 3) can be fixed quickly and reliably.
There are people who think this way in software, mostly in the Erlang community (see Error Kernels, Let it Crash…). Maybe other parts of the software world should listen to them more and take inspiration from them.
Why does all software have to be bug free? I regularly buy some things off ebay and it often comes with flaws or poor instructions but I don’t care because some things aren’t that important and I will take the cheap price over quality. For many websites/applications, I really don’t care if it has the occasional issue, I’d much rather the extra features that come from the fast development.
Some things are really important and I would be very upset if there was a major issue with my bank or server host had issues but they never do because they understand that their services have no room for errors and spend the extra time making sure nothing ever breaks and as a result tend to be behind the times in tech.
Banks do have major issues from time to time. In South Korea, major banks have been down for a day multiple times. Yes, these incidents make the news headline and everyone is surprised by them, but they do happen.
Motor vehicle industry was at a very early stage and built by humans too, but that stage was very short, not decades of mass use. While you’re right, I think there’s third reason that supports the first two: the scale of things in IT is so much larger than anything else! This makes the period of getting mature much longer.
Reminds me of organic beings. Small insects mature and live quickly, large mammals do so longer, their systems are more complex.
Callously equating people actually dying to software bugs (which, granted, are occasionally that severe):
Per-capita motor vehicle fatalities in the US peaked in 1937 at 29 per 100,000, 37 years after the first statistics in Wikipedia, and about 52 years after the invention of the automobile. They bounce around a bit but don’t change a whole lot for the next 30 years: 26 per 100,000 in 1969. Then we see a decline, with a real drop-off beginning in the 1980s and continuing to today.
If we’re looking to the motor vehicle industry for an analogy we might expect quite a long period of buggy software ahead of us.
Wait, aren’t we mixing terms here? How many people are dead because of actual motor vehicle failure? As in the brakes stopped working or the engine exploded?
I’m pretty sure the majority of those deaths are due to human error, both on part of the drivers and pedestrians. As soon as we implemented things like mandatory seatbelts and airbags, and developed better roads, signals and laws, the number went down. Cars, of course, became safer themselves, but we see old and vintage cars on the streets today, and I don’t think the death rate among their drivers os on the same level as it was when the car was produced.
A 1950 car in 2018 is safer, than a 1950 car in 1950.
So, if what I said is at least remotely true (can’t do research atm), then the main reason for cars becoming safer are external, something that was added to the cars and systems around them. This isn’t possible to buggy software: we can’t create universal patches to make the whole set of software less buggy.
They literally have cars accelerating when they shouldn’t be after a hundred years of use. Their bugs are worse than ours.
They only figured out seatbelts in the most recent quintile of that hundred years. Airbags are as recent. They haven’t yet figured out “don’t crash into stationary object”.
If that’s what software is going to be like, my grandchildren will be born, grow up, and die before progress is made on the front of reliable software.
I, for one, can only name a couple of software products that are rock solid.
Does anyone know of a list or something that attempts to collect projects that are really really reliable and stable? I too can only came very few projects like this and I’d like to know of more for inspiration, examples, and to use myself.
I can share my list: TeX, Ynab Classic, Emacs and Vim (if you aren’t getting crazy with plugins), Sublime Text, OmniFocus, Devonthink (basic version at least).
I’d be interested in seeing a list of well crafted software. Maybe we could start a curated list or something?
Software correctness is not a developer decision, it’s largely a business decision guided by cost management. I mean depending on where you work and what you work on the software may be so stable that when you try to point out a problem the business will simply point out that the software is correct because it’s always correct and that you’re probably just not understanding why it is correct. Apps are buggy mostly when the costs of failure to the business are low or not felt by management.
Came here to say exactly this.
There is no barrier to entry or minimum bar for consideration in software.
So you end up with thousands of businesses saying variations of “our budget is $1000 and we want you to make a software that …”.
Then of course you are going to see lots of failure in the resulting software.
The choice often ends up being “spend 10,000x and make it super robust” or “live with bugs”.
No business chooses the first option when you can say “oops sorry that was a bug we just fixed it. thank you! :)”.
This pattern persists even as the cost of developing software comes down. Meaning if you reduce the cost of producing flawless software to $X the market will choose a much more buggy version that costs a fraction of $X because the cost of living with those bugs is still much lower than the cost of choosing a flawless one.
I recently moved to financial software development, and it seems everybody has real life experience of losing huge sums of money to a bug, and everybody, including management and trading, is willing to try practice to reduce bugs. So I became more convinced that it is the cost of bugs that matters.
While this is true, don’t you think this is sort of… pathetic? Pretty harsh, I couldn’t come up with a better word on the spot. What I mean is, this is basically “those damn suits made us do it”.
Not really.
Would you like your mobile phone screen to be made bullet proof and have it cost $150M?
Would you like an atomic bedside alarm clock for $500k?
A light bulb that is guaranteed to not fail for 200 years for $1,000?
It’s a real trade-off and there’s a line to be drawn about how good/robust/reliable/correct/secure you want something to be.
Most people/businesses can live with software with bugs and the cost of aiming for no bugs goes up real fast.
Taking serious steps towards improving software quality is very time consuming and expensive so even those basic first steps wont be taken unless it’s for something critical such as aircraft or rocket code.
For non-critical software often there’s no huge difference between 0 bugs or 5 bugs or 20 bugs. So there isn’t a strong incentive to try so hard to reduce the bugs from their initial 100 to 10 (and to keep it there).
The case that compels us to eliminate bugs is where it is something to the effect of “no bugs or the rocket crashes”.
Also you have to consider velocity of change/iteration in that software. You can spend tons of resources and have your little web app audited and certified ast it is today but you have to think of something for your future changes and additions too.
As the technology improves the average software should become better in the same way that the average pair of shoes or the average watch or the average tshirt becomes better.
Quite exaggerated, but I get your point. The thing is — yes, I personally would like to pay 2-3x for a phone if I can be SURE it won’t degrade software-wise. I’m not worried about hardware (as long as the battery is replaceable), but I know that in 2-3 major OS updates it will feel unnecessarily slow and clunky.
Oh, man, that’s whole other story… I can’t remember the last time I wanted software to update. And the only two reasons I do update usually are:
Which brings us back to my original point: we got used to it and we don’t create any significant pressure.
Businesses that allow buggy code to ship should probably be shamed into better behavior. They exist because the bar is low, and would cease to exist with a higher bar. Driving them out of business would be generally desirable.
A boycott would need to start or be organized by developers, since developers are the only people who know the difference between a circumstance where a high-quality solution is possible but difficult, a circumstance where a high-quality solution is trivial but rare for historical reasons, and a situation where all solutions are necessarily going to run up against real, mathematical restrictions.
(Also, most code in existence isn’t being developed in a capitalist-corporate context, and the most important code – code used by everybody – isn’t being developed in that context either. We can and should expect high quality from it, because there’s no point at which improving quality becomes “more than my job’s worth”.)
I don’t agree about the cost management reasoning. Rather it is a business decision that follows what customers actually want. And customers actually do prefer features over quality. No matter how much it hurts our pride in craftsmanship…
The reason we didn’t see it before software is that other fields simply don’t have this trade off as an option: buildings and cars can’t constantly grow new physical features.
Speed / Quality / Cost
Pick two
You can add on features to cars, and buildings, and the development process does sometimes go on and on forever. The difference is if your cow clicker game has a game breaking bug, typically nobody literally dies. There exists software where people do die if there are serious bugs and in those scenarios they either compromise in speed or cost.
We’ve seen this before software in other fields, and they do have this trade off as an option, you just weren’t in charge of building it. The iron triangle predates software though I do agree scope creep is a bigger problem in software it is also present in other industries.
I agree. I suppose this is another thing that we should make clear to the general public.
But the problem I’m mostly focusing on is the problem of huge accidental complexity. It’s not business or management who made us build seemingly infinite layers and abstractions.
Oh it definitely was. The waterfall process, banking on IBM/COBOL/RPG, CORBA, endless piles of objects everywhere, big company apps using obfuscated formats/protocols, Java/.NET… these were middle managers and consultants forcing bullshit on developers. Those bandwagons are still going strong. Most developers stuck on them move slower as a result. The management solution is more bullshit that looked good in a PowerPoint or sounded convincing in a strip club with costs covered by a salesperson. The developers had hardly any say in it at all.
With that status quo, we typically are forced to go with two options: build the new thing on top of or within their pile of bullshit; find new niches or application areas that let us clean slate stuff. Then, we have to sell them on these whether internally or externally. Doing that for stuff that’s quality-focused rather than feature/buzzword-focused is always an uphill battle. So, quality-focused software with simple UI’s aren’t the norm. Although developers and suppliers cause problems, vast majority of status quo is from demand side of consumers and businesses.
It isn’t? Most managers I’ve met come and see me saying, we dont want to have to think about this, so build on top of this abstraction of it. They definitely do not want us wiping the slate clean and spending a lot of time rebuilding it anew, that would be bad for business.
That battle is already lost. I think we should instead show them what can be done by creating good alternatives to most of the buggy systems they use. Just one, full-featured, high-quality product after another. The complexity of what the market demands will make failure inevitable for some of the components. In those cases, we can show them how much better one can contain and/or recover from failures. Maybe even show fuzzing results of these pieces of software vs the norm to put some numbers on the difference. Also, prompt updates after problems are found. Eventually, we’ll have a huge pile of software that works better than the other stuff.
With that in place, we can then do write-ups arguing that people should demand better stuff like (cites many apps/systems) from suppliers. We can convince people to switch, convince companies or FOSS teams to use similar methods, talk about regulations, and so on. It helps to have a lot of good examples, though. On my part, I’ve been using case studies of both lightweight QA and high-assurance systems to show producers can build stuff way better than the norm. With lightweight methods, it sometimes reduces debugging so much that it saves money and time. Otherwise, just cost a little extra to save users a lot of headaches. If they can use and afford it, at that point the only reason they would be using inferior methods is that they don’t care. At least, that’s what we tell and show their customers. ;)
This feels right. Maybe we need some sort of movement, a manifest similar to agile, for developers to unite under.
This feels like catch-22: if we don’t make the product quickly, then someone else will, and it won’t be very robust; if we make the product quicker than the competition, then ours won’t be very robust. So either way, the first to the market is often the winner of the market (at least for some period), and due to the development speed requirements it’ll be buggy.
I upvoted this because I think it’s a good reflection, but I think the author is dead-wrong.
The author keeps saying “amateur” (one who does a thing for the love of that thing) instead of “novice” (one who does a thing while at a low skill level in that thing)–and most of the software we have is written and built by novices, like it or not.
More importantly, I think the author misses the truth of the evolution of software: we’ve shied away from software that exposes users to an ugly if accurate view of their data and processes and instead towards a sort of enfeebled shiny existence. We have then celebrated this as progress, and loudly convinced them that they shouldn’t have to be tool-users in order to use tools.
We’ve similarly pushed away (and for good reason!) from building simple appliances–good for our users, because it helps them adapt their tools to their problems (instead of adapting themselves or learning not to have problems), and good for us, because there’s always money in selling people a new solution to fix a problem they didn’t know they had.
I was about to post this. I also wanted to add that it is not solely about bad software implementation. The trend of making things shiny and abstracting users away from the reality of digital processing also results in software packages that try to do things that should not really be attempted in a consumer market.
As an extreme example, digital assistants. The full perfect implementation of a digital assistant is a an AI complete problem, and there is no way it can perform the job a naive user expects (or the advertising implies). These kinds of applications can only ever succeed in training users that software is an opaque black box that arbitrarily works or doesn’t and there is nothing anyone can do to solve this. We should be educating people on what software can do and what it can’t in a systematic way so people can make the best use of it.
OP here,
thank you, this is very interesting and generates more thoughts. I’d love to extend my post later.
I agree 100%. I wrote another post that is somewhat relevant https://rakhim.org/2017/02/by_devs
In short: when software was written for ‘computer users’, not just ‘people’, it was harder to use and required reading manuals, but at least it was honest and you’d see what you get. Today the “enfeebled shiny existence” is less honest and naturally less stable, due to all the magic required to maintain the facade.
Also, ‘novice’ implies the temporary nature, as in ‘novice will become a master’. I wouldn’t call, for example, core Apple developers ‘novices’, yet their products are incredibly buggy and low quality lately.
I think “novice” was used to suggest that most bad softwares are created by novice professionals, not amateurs.
I’ve been reading a lot of Nancy Leveson’s work and she provided had an amazing explanation for why software engineering is so different from “conventional” engineering. In f.ex mechanical engineering, the main danger is component failure: Something breaks and cascades through the machine. In software engineering, the main danger is emergence: the combination of multiple interacting pieces, all working perfectly, leads to a global problem.
It’s not a “we’re more incompetent than the REAL software engineers”. She studied the designers of missile systems, power plants, and aircraft, all places that take software engineering extremely seriously. But they’re all still buggy for emergence reasons.
It sure feels like the emergence is the consequence of the sheer scale.
I came across this tweet recently https://twitter.com/nikitonsky/status/1014411340088213504 Bet that missile systems, power plants, and aircraft all have less code than many relatively simple desktop apps.
That’s interesting. A quick search indicates that the F-35, which has had numerous delays and reliability issues (I read somewhere that pilots have to reboot one of the onboard computers every 10 minutes or so) has over 8 million lines of code.
It’s true. I don’t think it counters the point, though. How many of those systems are designed with integration patterns or analyses that ensure the individual components work together properly? I doubt many. The few I’ve seen came out of the correct-by-construction approaches. Even they usually have simplified mechanisms for the integration that make it easier to analyze the system. Many real-world systems use unnecessarily, complicated forms of integration from how they couple modules up to the build systems they use.
I think emergence will have a mix of intrinsic and accidental complexity as usual. I think many failures are caused by accidental, though.
Is there a good link to her argument?
I’m basing a lot of this off her free online book engineering a safer future. She also has a seminar on it here: https://youtu.be/8bzWvII9OD4
Every time one of my users inadvertently reports a bug by describing a problem followed by asking “is this normal?” I feel shame for our entire profession.
Tell me about it… I got into a heated debate with one of my coworkers. They were claiming our product worked “because users don’t complain about it.” Our tests actually show that our product perform badly (in terms of correctness) and our Sentry is full of errors.
It is a well known fact that, when using computers, users just always blame themselves when the software is buggy: “Oh I used it wrong”, “Oh I did something wrong.” I see that everyday with my 70 year old parents when they use any shitty web application.
I tried to think about what software I’ve used that is buggy and I realised it’s almost entirely websites and mobile apps. The actual applications I run on my Mac are pretty damn solid. Chrome, Sublime, WebStorm and PyCharm, Lightroom, Spotify even… They all work well. I actually couldn’t think of the last bug I ran into in any of them. There clearly are a bunch of engineers out there who care about the correctness and stability of their software and put a lot of work into it. But we don’t really hear from them too much (or, they’re buried under the endless self-promotion of the JS Framework Shootout crowd).
I work on a number of very buggy web applications right now. They’re built on (IMO) very poor decisions made by people who didn’t really know what they were doing, but who got a bunch of funding and built something anyway. In some ways that’s an impressive achievement (I’ve never built a successful company) but it’s also embarrassing. And my whole company seems to have internalised the idea that software fails. So production bugs are just normal things, as are weekly hot fixes to critical issues. It’s definitely a problem. We’re not doing anything to address the deep-rooted issues or the mistakes of the early days. No-one even mentions it. I don’t think anyone would dare to think that big.
I work on a free software webapp project, pump.io. We try really hard not to bloat it with features and still I feel like I’m just flailing about in the dark with no real idea what I’m doing.
It’s written in JavaScript. I like JavaScript a lot, but I also wonder if that’s a big part of why I feel I have no control over the system. Maybe TypeScript will help with this.
There are an incredible amount of XXX and TODO comments scattered around. Lots were there when I took over maintenance, and lots were written by me. I wonder a lot if I’ll ever actually get to go back and fix them. I hope the answer is yes but realistically I’m guessing the answer is no because (external) things change too rapidly and I’ll never have enough stability to go do low-priority polish like that. It’s interesting and kinda sad to me that adding XXX comments to the code makes me feel better. It’s like I’d be in the exact same mess without them, but at least I feel like I have some grasp on the mess.
I don’t know. This article, and this comment bummed me out. They’re kinda spot on. The feeling I have now reminds me of Be Nice To Programmers. (I will note, though, that it is 2 AM and I’m sleep deprived.)
Grass is always greener on the other side. For any other industry, we can find good and bad examples. Nuclear powerplants were considered rocksolid in Japan and now see how people see it. Delta rockets were always seen as stellar but now see how people feel about Falcon vs Delta rockets. 70s/80s were not any better or worse than today, if 80s industry was so good and fair, we would not see any movement like GNU.
It is all business.
My rule of thumb for engineering is that engineering is the transfer point of scientific knowledge to business. An engineer should understand both sides sufficiently well and act accordingly. Usually what I see is, though, that engineers fall in love with science and hate business.
We know how to produce software that doesn’t fail. It’s not particularly easy or fast or cheap but we can do it.
The practical reality is software that “doesn’t fail” is rarely genuinely demanded or needed. So the market doesn’t ask or pay for it.
Even when businesses demonstrate an initial interest in the idea they will immediately back down as soon as they face the reality of various costs of producing such a thing. Their expression of interest is mostly just a big wish.
If someone’s paying $10k for let’s say a custom Wordpress plugin to be made in 2 weeks, would they be interested in a much more secure and much less buggy version that costs $10M that is made in 2 years? No. They don’t really want it and they don’t really need it.
This is true, but it’s also very true that many of the costs of shit software are externalized (THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT), and so no, that $10k spent doesn’t actually cover the total cost of the software to the buyer.
Yet modern software, being complex, does fail from time to time. As do all those things engineers work on. The engineer’s solution to that is not trying to build complex things that never fail; it’s attacking the problem on all fronts. Decreasing defect rate is part of the solution, but so is actually measuring the frequency and impact of defects as well as thinking about failsafes.
Sometimes doctors give medicine to patients and it fails, because of a bad diagnostic or an unplanned adverse reaction, but mostly because biology is complex. That’s why patients are monitored in the hospital.
Sometimes accidents happen in nuclear power plants, because physics is complex and components can have defects. Most of them are not critical because engineers have planned for the unplanned.
Sometimes trains don’t start, drivers get sick or go on strike, trees fall on rails… Yet the whole country does not end up being paralyzed for it.
It’s funny that the author uses the automotive industry for comparison, when you think about it. Quick, what’s the first word that comes to your mind when you hear “breakdown”? (Maybe it doesn’t work as well in English because it could be something like “nervous”, but that’s not the case in my native language…) What’s the first cause of accidental death worldwide that isn’t health-related?
I had an education as an engineer in networks and electronics; we had courses in “resilience” that dealt with things like redundancy, MTBF / MTTR, monitoring… as well as the impact of component complexity on failure rates. A popular approach in those fields is to use cheap, relatively simple components, assume they will fail, and then make sure the failure of a component is 1) not critical ; 2) easy to detect and 3) can be fixed quickly and reliably.
There are people who think this way in software, mostly in the Erlang community (see Error Kernels, Let it Crash…). Maybe other parts of the software world should listen to them more and take inspiration from them.
Why does all software have to be bug free? I regularly buy some things off ebay and it often comes with flaws or poor instructions but I don’t care because some things aren’t that important and I will take the cheap price over quality. For many websites/applications, I really don’t care if it has the occasional issue, I’d much rather the extra features that come from the fast development.
Some things are really important and I would be very upset if there was a major issue with my bank or server host had issues but they never do because they understand that their services have no room for errors and spend the extra time making sure nothing ever breaks and as a result tend to be behind the times in tech.
Banks do have major issues from time to time. In South Korea, major banks have been down for a day multiple times. Yes, these incidents make the news headline and everyone is surprised by them, but they do happen.
IMO there are two main reasons behind bugs:
We will probably improve a bit on the first issue in a couple of centuries, but the second is inherent to the matter.
Motor vehicle industry was at a very early stage and built by humans too, but that stage was very short, not decades of mass use. While you’re right, I think there’s third reason that supports the first two: the scale of things in IT is so much larger than anything else! This makes the period of getting mature much longer.
Reminds me of organic beings. Small insects mature and live quickly, large mammals do so longer, their systems are more complex.
Callously equating people actually dying to software bugs (which, granted, are occasionally that severe):
Per-capita motor vehicle fatalities in the US peaked in 1937 at 29 per 100,000, 37 years after the first statistics in Wikipedia, and about 52 years after the invention of the automobile. They bounce around a bit but don’t change a whole lot for the next 30 years: 26 per 100,000 in 1969. Then we see a decline, with a real drop-off beginning in the 1980s and continuing to today.
If we’re looking to the motor vehicle industry for an analogy we might expect quite a long period of buggy software ahead of us.
Wait, aren’t we mixing terms here? How many people are dead because of actual motor vehicle failure? As in the brakes stopped working or the engine exploded?
I’m pretty sure the majority of those deaths are due to human error, both on part of the drivers and pedestrians. As soon as we implemented things like mandatory seatbelts and airbags, and developed better roads, signals and laws, the number went down. Cars, of course, became safer themselves, but we see old and vintage cars on the streets today, and I don’t think the death rate among their drivers os on the same level as it was when the car was produced.
A 1950 car in 2018 is safer, than a 1950 car in 1950.
So, if what I said is at least remotely true (can’t do research atm), then the main reason for cars becoming safer are external, something that was added to the cars and systems around them. This isn’t possible to buggy software: we can’t create universal patches to make the whole set of software less buggy.
They literally have cars accelerating when they shouldn’t be after a hundred years of use. Their bugs are worse than ours.
They only figured out seatbelts in the most recent quintile of that hundred years. Airbags are as recent. They haven’t yet figured out “don’t crash into stationary object”.
If that’s what software is going to be like, my grandchildren will be born, grow up, and die before progress is made on the front of reliable software.
Does anyone know of a list or something that attempts to collect projects that are really really reliable and stable? I too can only came very few projects like this and I’d like to know of more for inspiration, examples, and to use myself.
I can share my list: TeX, Ynab Classic, Emacs and Vim (if you aren’t getting crazy with plugins), Sublime Text, OmniFocus, Devonthink (basic version at least).
I’d be interested in seeing a list of well crafted software. Maybe we could start a curated list or something?
awesome-reliable-software, maybe? See awesome.
Joe Armstrong would say that we should.