A couple of points I agree with, but a lot of stuff that strikes me as so vague that it doesn’t say anything actionable.
Writing non-trivial software that is correct (for any meaningful definition of correct) is beyond the current capabilities of the human species.
I have a hard time engaging with this due to the vagueness of “non-trivial” and “meaningful”, unless the definition of non-trivial software is “software which is eventually found to be incorrect” (perhaps that being some sort of truth in itself).
Most measures of success are almost entirely uncorrelated with merit.
Without defining success, measures, merit, or the context for this (success in one’s career? success as a human?) I can’t really agree or disagree with this. It’s a non-statement.
Being aligned with teammates on what you’re building is more important than building the right thing.
In a commercial context, ultimately, the vast majority of engineers are responsible for shipping artifacts that customers want or, more importantly, that the business can get paid for. At a music software company, if almost all of my teammates are aligned that we’re making a database, and one is aligned that we’re making the jukebox that customers pay for, their idea is more important than our alignment.
In commercial contexts, I posit that we’re here to create artifacts of value first and seek consensus and make friends second. Many folks seem to me to misunderstand this.
The fact that current testing practices are considered “effective” is an indictment of the incredibly low standards of the software industry.
This strikes me as incredibly dismissive and, frankly, arrogant of the author–just because they (presumably) haven’t run into high standards of testing and correctness in the software industry doesn’t mean they don’t exist or aren’t being improved on.
“The software industry” spans throwaway Flash games that will never get an update, to millions and millions of lines of code that runs rockets (occasionally with errors, to be fair), to tight assembly and C loops that run microwaves and pacemakers, to piles of Javascript that get patched only if the percent of users keep throwing an exception in prod rises past some threshold, to dozens of others applications and verticals.
Scoffing at the “incredibly low standards” of the software industry is about as reasonable as scoffing at the “incredibly low standards” of the construction industry, which similarly has projects ranging from painting a wall to building a hydroelectric dam–it’s too broad a brush!
Thinking about things is a massively valuable and underutilized skill. Most people are trained to not apply this skill.
I feel that this can easily be misread as implying that most people don’t think about things. A single word change–“trained” becomes “incentivized”–would remove my disagreement. We don’t have to assume the worst of those people or their environments in order to explain their observed behavior.
I think many of the points in this post are easily agreeable because of the vagueness - people can read into these however they want to reaffirm their own beliefs.
There’s definitely a tradeoff between saying things that are vague and nonactionable and saying things that are specific but incorrect. I went very far to the vague side of things for this post, since I think it’s pretty hard to define most of these things in a way that is precise enough to be useful, but its also still accurate, but I think these observations might still be useful.
I have a hard time engaging with this due to the vagueness of “non-trivial” and “meaningful”, unless the definition of non-trivial software is “software which is eventually found to be incorrect” (perhaps that being some sort of truth in itself).
By “non-trivial”, I mean software of the complexity of any web or mobile app I use, or a text editor or compiler. Probably software less complex than that as well, I’m not sure how much less complex.
I was chatting with a friend about this before publishing this - here’s part of that conversation that might illuminate my thinking more:
I’m thinking of real software that runs on real computers, not software in the abstract - in order for some application to be correct, imo, it would need to mitigate any incorrectness in the programming language, operating system, chip, etc. I think that many OS and language bugs are fairly commutative (maybe not the right word - what I mean is that many OS bugs will cause bugs in most programs (probably not a majority of OS bugs, but a significant number)). I think that when you combine that with the difficulty of writing a correct program, this is almost impossible with current tools. Like, what chip + OS + language would you choose? Thinking about it more, I guess writing formally verified C on seL4 or something might have a reasonable chance of being correct? No idea what chip you’d run it on, though (and disk, if it requires storage, etc)
IMO it’s fair to count security vulnerabilities of categories currently unknown as bugs. Just because a format string vulnerability was written in the 90s, doesn’t mean it’s not exploitable. I don’t think “we didn’t know that bug could happen” is a good excuse.
Without defining success, measures, merit, or the context for this (success in one’s career? success as a human?) I can’t really agree or disagree with this. It’s a non-statement.
Different people have different definitions of these things, but I’ve yet to meet someone who had a definition of merit and success where I would say the two are strongly correlated. The combination of noise in the conditions that generate success and vast inequality in starting levels of success are usually the largest contributors to these being uncorrelated, for the various definitions of “success” and “merit” I’ve seen.
In commercial contexts, I posit that we’re here to create artifacts of value first and seek consensus and make friends second. Many folks seem to me to misunderstand this.
I think that it is incredibly difficult to build an artifact of value without being aligned on what artifact of value you’re building.
This strikes me as incredibly dismissive and, frankly, arrogant of the author–just because they (presumably) haven’t run into high standards of testing and correctness in the software industry doesn’t mean they don’t exist or aren’t being improved on.
I think that you need to look at at least the 99.99th percentile of software projects before you start seeing types of testing or verification that aren’t essentially automated manual testing - I think that it’s fair to say that the software industry, collectively, has low standards because of that.
The software industry is very young, and I think that we basically haven’t figured out how to test software in a cost-effective way yet, but I think we can get there.
I feel that this can easily be misread as implying that most people don’t think about things. A single word change–“trained” becomes “incentivized”–would remove my disagreement. We don’t have to assume the worst of those people or their environments in order to explain their observed behavior.
The reason I chose “trained” instead of “incentivized” is that I think it’s pretty common for people to have been in situations that disincentivized thinking about things, but then continue to not think very much about things or notice things not making sense/not adding up once they’re removed from those situations.
in order for some application to be correct, imo, it would need to mitigate any incorrectness in the programming language, operating system, chip, etc.
I’m not prepared to say we cannot make an airplane correctly because pilots keep getting drunk, or we can’t make software because all cpus we can ever make are secretly analogue and leaking internal state in the form of RF (or whatever) because this kind of thinking is not useful.
I’m generally satisfied that software is correct if it produces the correct output for the defined input domain. Most people have an even lower bar than that.
The software industry is very young, and I think that we basically haven’t figured out how to test software in a cost-effective way yet, but I think we can get there.
The software industry has been around since the 50s, arguably earlier. We are nearly as old, for example, as the airline industry. A lot of folks have and are doing cost-effective testing on every part of their designs that matter–I think you also are discounting just how effective good manual testing can be in favor of an academic view of “correctness”.
One of the big differences between engineering and science is that engineers are judged on their ability to make things of value without being completely correct or accurate and instead on being “close enough for practical purposes”.
One of the big differences between engineering and science is that engineers are judged on their ability to make things of value without being completely correct or accurate and instead on being “close enough for practical purposes”.
Science has p=0.05 for more-or-less the same reason, right?
I suppose you’re right, though I think that’s more for deciding how important results are.
My favorite quote about engineering is by Dr. A. R. Dykes:
Engineering is the art of modelling materials we do not wholly understand, into shapes we cannot precisely analyse so as to withstand forces we cannot properly assess, in such a way that the public has no reason to suspect the extent of our ignorance.
Importance seems a very loaded term - I’m sure, given time, that I could find an extremely unimportant hypothesis to test.
I suspect the closest analogy is deciding whether observations (raw materials) are sufficiently likely to match a hypothesis (specification) to justify their use.
Whether it’s headline design failures like the 737 Max, cost overruns like the F-35, or perpetual failures like leaving the climate crisis or public health unaddressed (it’s common for small prop aircraft to use tetraethyllead in their fuel, for example) I’m not convinced that testing an airplane in a cost-effective way is a solved problem, either.
But both aeronautics and software engineering are ridiculously young disciplines. It took plumbers millennia to work out not to use lead.
Let’s keep in mind things like NonStop already run five 9’s without formal verification. From there, you might do something like Verisoft that went from app down to the chip, DeepSpec that went further in that way, or Rockwell Collins’ approach (pdf) with AAMP7G CPU.
Then there’s lighter-weight approaches applied to larger systems with low, defect rates such as Cleanroom and Praxis Correct-by-Construction (pdf). Note that these aren’t an upper bound of the low-cost approaches given there’s been many advances in lightweight and automated methods for reducing defects in various parts of the life cycle.
I’ve omitted explanations of why I believe these things, mostly so that I could get this post out the door at all - each one of these could easily be it’s own blog post. Think about them for a bit, and possibly you’ll find them compelling :)
I don’t find any of them compelling precisely because there are no explanations. Most of the statements are provocative and so vague that you can read whatever you want into them. That’s not productive discussion, that’s clickbait.
I’d much rather see a deeper exploration of these topics as opposed to just getting a “post out the door”.
It’s rare that I find myself less in disagreement with an “… about software engineering” laundry list.
Most measures of success are almost entirely uncorrelated with merit
This one I think is a nugget which only become obvious after years of experience in the business. Understanding this is also, IMHO, the first step to reach a better appreciation of self-worth and work-related accomplishments.
The fact that current testing practices are considered “effective” is an indictment of the incredibly low standards of the software industry.
This one hits close to home. End-to-end testing in most environments I’ve worked in is by far the most valuable but also so very very hard. And so, all the good tooling is for the low hanging fruit: unit tests. Fortunately (and consequently) the industry is pushing all things towards more functional paradigms– because it’s the easiest to test.
There are many fundamental discoveries in computer science that are yet to be found.
Maybe. But I’m sure they will be just as ignored as the existing leap forwards like Design by Contract and class invariants ….
Peak productivity for most software engineers happens closer to 2 hours a day of work than 8 hours.
Very true, alas, an open question remains as to which 2…
Most measures of success are almost entirely uncorrelated with merit.
The computer industry has a long history of confusing business success with excellence in programming. eg. Remember the Bad Old Days when everybody did RUP, because IBM did? And now everybody doing React, because Facebook does?
How kind your teammates are has a larger impact on your effectiveness than the programming language you use.
Alas, we’re in a job that requires you to be correct… which is a subtly and painfully different thing to being Right. Your code must objectively be correct, but alas, despite our firmest held beliefs, there is no objectively Right code. (But a fair amount of objectively Wrong code).
So much pain and conflicts arises from these subtle distinctions.
The amount of sleep that you get has a larger impact on your effectiveness than the programming language you use.
Again subtleties everywhere… too much sleep is a feedback loop for depression, too little is a feedback loop for mania. Alas, the obsessive focus required to be a good programmer will tip the sanest of us too far one side or the other at several points in our careers.
“…Writing non-trivial software that is correct (for any meaningful definition of correct) is beyond the current capabilities of the human species…”
We are able to fool ourselves into thinking we understand code a freaking lot better than we are at actually understanding code
The weird thing about software engineering isn’t that we can create complex systems. We’ve been able to do that since probably before the pyramids. The weird thing is that we’re so easily able to create complex systems and fool ourselves into thinking that they’re simple. When teaching TDD, I can show a senior programmer (whatever that means) four lines of code, have them think it’s fine, and there’s more than one error in it. And these are the same folks who feel quite confident rolling out 400KLOC systems and feeling they understand them.
For all of us, software becomes magic quite quickly. We need to keep reminding ourselves of that, probably because remembering is too painful.
When teaching TDD, I can show a senior programmer (whatever that means) four lines of code, have them think it’s fine, and there’s more than one error in it. And these are the same folks who feel quite confident rolling out 400KLOC systems and feeling they understand them.
C.A.R.Hoare 1980 Turing Award Lecture[1]; Communications of the ACM 24 (2), (February 1981): pp. 75-83.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature.
As a former avid TDD practitioner, one of the biggest issues I have seen over and over (especially in cases where e.g. ping-pong pairing was used):
Code that has a full test suite, a clean API, and is a massive, steaming pile of pointless complexity.
This is (in general) fine (as it’s easy to replace with something better) unless it touches a datastore, in which case it will usually corrupt the schema with its internal complexity. Then you have to unpick the internal state and migrate the schema in order to remove the complexity.
IMO it’s a good technique for test suite generation and public interface design. That can be an acceptable tradeoff in some situations. Specifically, where you have:
A need for comprehensive testing of edge-cases (eg implementing incompletely-specified business requirements, or using a dynamic language)
No subordinate APIs you can modify (eg no database / network / filesystem calls unless to a 3rd party system)
Failing the latter requirement is where the real mess gets generated. Code is easy to fix, especially if you don’t need to change the public interface; data is not.
Being aligned with teammates on what you’re building is more important than building the right thing.
I surely don’t believe this. I’m not even sure these things can be placed in opposition (or at least tension). It makes for a nice soundbite but I don’t think it withstands scrutiny.
I’ve seen and worked in teams where all members were aligned and we worked really well together but, at the end the day nobody bought the product. This means we didn’t build the right thing. Now, if you take a team and then assign them to build the right thing you have something really powerful.
How kind your teammates are has a larger impact on your effectiveness than the programming language you use.
The amount of sleep that you get has a larger impact on your effectiveness than the programming language you use.
I relate and agree to these two points the most. Working with bigger teams in the last year, I realized that even when I wasn’t burned out, but simply was getting sleep, I couldn’t meet my personal standards at work.
In addition, working with teammates that I actually enjoy working with made me easily twice as productive.
A couple of points I agree with, but a lot of stuff that strikes me as so vague that it doesn’t say anything actionable.
I have a hard time engaging with this due to the vagueness of “non-trivial” and “meaningful”, unless the definition of non-trivial software is “software which is eventually found to be incorrect” (perhaps that being some sort of truth in itself).
Without defining success, measures, merit, or the context for this (success in one’s career? success as a human?) I can’t really agree or disagree with this. It’s a non-statement.
In a commercial context, ultimately, the vast majority of engineers are responsible for shipping artifacts that customers want or, more importantly, that the business can get paid for. At a music software company, if almost all of my teammates are aligned that we’re making a database, and one is aligned that we’re making the jukebox that customers pay for, their idea is more important than our alignment.
In commercial contexts, I posit that we’re here to create artifacts of value first and seek consensus and make friends second. Many folks seem to me to misunderstand this.
This strikes me as incredibly dismissive and, frankly, arrogant of the author–just because they (presumably) haven’t run into high standards of testing and correctness in the software industry doesn’t mean they don’t exist or aren’t being improved on.
“The software industry” spans throwaway Flash games that will never get an update, to millions and millions of lines of code that runs rockets (occasionally with errors, to be fair), to tight assembly and C loops that run microwaves and pacemakers, to piles of Javascript that get patched only if the percent of users keep throwing an exception in prod rises past some threshold, to dozens of others applications and verticals.
Scoffing at the “incredibly low standards” of the software industry is about as reasonable as scoffing at the “incredibly low standards” of the construction industry, which similarly has projects ranging from painting a wall to building a hydroelectric dam–it’s too broad a brush!
I feel that this can easily be misread as implying that most people don’t think about things. A single word change–“trained” becomes “incentivized”–would remove my disagreement. We don’t have to assume the worst of those people or their environments in order to explain their observed behavior.
I think many of the points in this post are easily agreeable because of the vagueness - people can read into these however they want to reaffirm their own beliefs.
There’s definitely a tradeoff between saying things that are vague and nonactionable and saying things that are specific but incorrect. I went very far to the vague side of things for this post, since I think it’s pretty hard to define most of these things in a way that is precise enough to be useful, but its also still accurate, but I think these observations might still be useful.
By “non-trivial”, I mean software of the complexity of any web or mobile app I use, or a text editor or compiler. Probably software less complex than that as well, I’m not sure how much less complex.
I was chatting with a friend about this before publishing this - here’s part of that conversation that might illuminate my thinking more:
Different people have different definitions of these things, but I’ve yet to meet someone who had a definition of merit and success where I would say the two are strongly correlated. The combination of noise in the conditions that generate success and vast inequality in starting levels of success are usually the largest contributors to these being uncorrelated, for the various definitions of “success” and “merit” I’ve seen.
I think that it is incredibly difficult to build an artifact of value without being aligned on what artifact of value you’re building.
I think that you need to look at at least the 99.99th percentile of software projects before you start seeing types of testing or verification that aren’t essentially automated manual testing - I think that it’s fair to say that the software industry, collectively, has low standards because of that.
The software industry is very young, and I think that we basically haven’t figured out how to test software in a cost-effective way yet, but I think we can get there.
The reason I chose “trained” instead of “incentivized” is that I think it’s pretty common for people to have been in situations that disincentivized thinking about things, but then continue to not think very much about things or notice things not making sense/not adding up once they’re removed from those situations.
I’m not prepared to say we cannot make an airplane correctly because pilots keep getting drunk, or we can’t make software because all cpus we can ever make are secretly analogue and leaking internal state in the form of RF (or whatever) because this kind of thinking is not useful.
I’m generally satisfied that software is correct if it produces the correct output for the defined input domain. Most people have an even lower bar than that.
Thanks for the response!
Another point of disagreement:
The software industry has been around since the 50s, arguably earlier. We are nearly as old, for example, as the airline industry. A lot of folks have and are doing cost-effective testing on every part of their designs that matter–I think you also are discounting just how effective good manual testing can be in favor of an academic view of “correctness”.
One of the big differences between engineering and science is that engineers are judged on their ability to make things of value without being completely correct or accurate and instead on being “close enough for practical purposes”.
Science has p=0.05 for more-or-less the same reason, right?
I suppose you’re right, though I think that’s more for deciding how important results are.
My favorite quote about engineering is by Dr. A. R. Dykes:
Importance seems a very loaded term - I’m sure, given time, that I could find an extremely unimportant hypothesis to test.
I suspect the closest analogy is deciding whether observations (raw materials) are sufficiently likely to match a hypothesis (specification) to justify their use.
Aeronautics comes with a particularly grave incentive for testing - in a way, I’m glad we haven’t had the same kind of progress in software.
Whether it’s headline design failures like the 737 Max, cost overruns like the F-35, or perpetual failures like leaving the climate crisis or public health unaddressed (it’s common for small prop aircraft to use tetraethyllead in their fuel, for example) I’m not convinced that testing an airplane in a cost-effective way is a solved problem, either.
But both aeronautics and software engineering are ridiculously young disciplines. It took plumbers millennia to work out not to use lead.
Let’s keep in mind things like NonStop already run five 9’s without formal verification. From there, you might do something like Verisoft that went from app down to the chip, DeepSpec that went further in that way, or Rockwell Collins’ approach (pdf) with AAMP7G CPU.
Then there’s lighter-weight approaches applied to larger systems with low, defect rates such as Cleanroom and Praxis Correct-by-Construction (pdf). Note that these aren’t an upper bound of the low-cost approaches given there’s been many advances in lightweight and automated methods for reducing defects in various parts of the life cycle.
I don’t find any of them compelling precisely because there are no explanations. Most of the statements are provocative and so vague that you can read whatever you want into them. That’s not productive discussion, that’s clickbait.
I’d much rather see a deeper exploration of these topics as opposed to just getting a “post out the door”.
It’s rare that I find myself less in disagreement with an “… about software engineering” laundry list.
This one I think is a nugget which only become obvious after years of experience in the business. Understanding this is also, IMHO, the first step to reach a better appreciation of self-worth and work-related accomplishments.
This one hits close to home. End-to-end testing in most environments I’ve worked in is by far the most valuable but also so very very hard. And so, all the good tooling is for the low hanging fruit: unit tests. Fortunately (and consequently) the industry is pushing all things towards more functional paradigms– because it’s the easiest to test.
Maybe. But I’m sure they will be just as ignored as the existing leap forwards like Design by Contract and class invariants ….
Very true, alas, an open question remains as to which 2…
The computer industry has a long history of confusing business success with excellence in programming. eg. Remember the Bad Old Days when everybody did RUP, because IBM did? And now everybody doing React, because Facebook does?
Alas, we’re in a job that requires you to be correct… which is a subtly and painfully different thing to being Right. Your code must objectively be correct, but alas, despite our firmest held beliefs, there is no objectively Right code. (But a fair amount of objectively Wrong code).
So much pain and conflicts arises from these subtle distinctions.
Again subtleties everywhere… too much sleep is a feedback loop for depression, too little is a feedback loop for mania. Alas, the obsessive focus required to be a good programmer will tip the sanest of us too far one side or the other at several points in our careers.
React really seems like a big step in the right direction, to me. I’m curious about the drawbacks you see with it.
Not bad. The only thing I’d add to this?
“…Writing non-trivial software that is correct (for any meaningful definition of correct) is beyond the current capabilities of the human species…”
We are able to fool ourselves into thinking we understand code a freaking lot better than we are at actually understanding code
The weird thing about software engineering isn’t that we can create complex systems. We’ve been able to do that since probably before the pyramids. The weird thing is that we’re so easily able to create complex systems and fool ourselves into thinking that they’re simple. When teaching TDD, I can show a senior programmer (whatever that means) four lines of code, have them think it’s fine, and there’s more than one error in it. And these are the same folks who feel quite confident rolling out 400KLOC systems and feeling they understand them.
For all of us, software becomes magic quite quickly. We need to keep reminding ourselves of that, probably because remembering is too painful.
C.A.R.Hoare 1980 Turing Award Lecture[1]; Communications of the ACM 24 (2), (February 1981): pp. 75-83.
As a former avid TDD practitioner, one of the biggest issues I have seen over and over (especially in cases where e.g. ping-pong pairing was used):
Code that has a full test suite, a clean API, and is a massive, steaming pile of pointless complexity.
This is (in general) fine (as it’s easy to replace with something better) unless it touches a datastore, in which case it will usually corrupt the schema with its internal complexity. Then you have to unpick the internal state and migrate the schema in order to remove the complexity.
Accidental complexity is no accident when it’s the outcome of a methodology that explicitly devalues up-front design.
IMO it’s a good technique for test suite generation and public interface design. That can be an acceptable tradeoff in some situations. Specifically, where you have:
Failing the latter requirement is where the real mess gets generated. Code is easy to fix, especially if you don’t need to change the public interface; data is not.
I surely don’t believe this. I’m not even sure these things can be placed in opposition (or at least tension). It makes for a nice soundbite but I don’t think it withstands scrutiny.
I’ve seen and worked in teams where all members were aligned and we worked really well together but, at the end the day nobody bought the product. This means we didn’t build the right thing. Now, if you take a team and then assign them to build the right thing you have something really powerful.
I relate and agree to these two points the most. Working with bigger teams in the last year, I realized that even when I wasn’t burned out, but simply was getting sleep, I couldn’t meet my personal standards at work. In addition, working with teammates that I actually enjoy working with made me easily twice as productive.