Learning to frame educational content in terms of the right information you’re presenting, instead of the wrong information you’re correcting, is really useful! As another example, when Julia Evans first posted this article, it was titled “DNS doesn’t propagate.” It’s now titled “DNS ‘propagation’ is actually caches expiring.” The second title is longer, but it’s clearer! Instead of saying only the wrong thing (leaving the right thing for the body of the article), you put the right thing in the article with the hope that people will want to know the detail.
It means that even people who just read the title have learned something, and it fits better with our memories because we remember titles and short concepts like that really well in general.
Agreed. This is similar to cold paths, a piece I wrote almost exactly a year ago. Workarounds add edge cases that add testing surface area, making the system more expensive to operate.
Cold paths are definitely a source of bugs, but the bigger problem in my mind is input space, i.e. the combinations of possible input data. We tend to focus on branch coverage when thinking about testing, but you can still have bugs with 100% branch coverage. That’s why testing is not equivalent to verification, where a program can be proven to work for all possible inputs and executions.
Testing is always looking at a completely infinitesimal subset of all of the possible behaviors of a program when considering all possible data states.
It really depends. Someone once said A good programmer knows when to hack and when to engineer.
An example. You have a business usage that is temporary and you can build a quick workaround.
Of course this doesn’t invalidate what’s in the article, but I don’t think it’s good to create rules/idioms/… that you blindly use in each and every situation.
Two scenarios I’ve encountered (and still encountering) at work:
We have to use a proprietary SS7 network stack. There are, to my knowledge, no open source alternatives, and even if there were, it’s too late to change now. We’re wedded to a six-figure licensed stack only certified to run on certain hardware/OS combos (SPARC, Solaris). And I’ve had to work around the code.
I would love to change the code, management doesn’t, unless there are test cases that fail now, and pass after the change. I’ve tried. I found some undefined C code that just happens to work with our compiler (but not GCC 11, which we aren’t using yet). You wouldn’t believe the dog-and-pony show to get a three line change (just moving some declarations to an outer scope) through. How do you test for undefined behavior?
I mean, isn’t that sort of thing the reason that “prefer” is used instead of “only”? And most of the rest of the article is basically saying “think hard before you conclude you can’t change the code”?
I’m talking about situations where you know how to solve your problem, but you’ve chosen to implement that solution in some additional layer of code on top, rather than changing the original problematic code
Right, I doubt OP would disagree with your assessment, and would mostly just be saying “If this happens a lot, maybe look for work where it happens less”?
Probably too late to change to it, and it only seems to support Linux, not Solaris [1]. It may be moot anyway as CDMA support is supposed to go away late this year, early next year. I can only hope.
[1] Solaris on SPARC machines was picked due to hardware requirements by the customer (the Oligarchic Cell Phone Companies).
There’s also UniversalSS7, which is Objective-C and I know of a couple of telcos using it in production. I believe it use to work on Solaris / SPARC, but it was updated to modern Objective-C a few years ago and I haven’t written the assembly code paths for SPARC or tested the clang parts (though this is a sufficiently fun project that I’d be tempted to do it if someone wanted to make a donation to a charity of my choice).
Thank you, but as I said, it’s probably too late to change it. Reasons include:
I’ve heard that our customer is planning on shutting down CDMA late this year, early next year;
We can no longer test CDMA in non-production environments (long story, and it’s not that big of an issue since the CDMA path hasn’t changed in a long time);
There’s new management (long story) and they are very risk averse.
I think the OP would argue your “excuse” is valid, therefore should switch jobs to a place where you don’t have such idiotic constraints (no matter whether you’re actually happy working where you are working)
Seeing this makes me happy. I said something similar a couple of years ago.
If a tool doesn’t do quite what you need, don’t try to paper over its deficiencies with a second tool. The maintenance burden of both will lead to compounding claims on your time, and on the time of others, thereby reducing the degrees of freedom of human society as a whole. Instead, take the first tool out, and think about the problem anew.
As always, it depends. It is better all things equal if you can get a new behavior by just adding new code and leaving the old code alone. It’s the open/closed principle in SOLID. But a) adding more layers makes things slower and b) adding more layers can make it harder to tell exactly what the hell is happening. So, it’s also good to have the confidence and go back and modify the existing code to actually do the thing you want instead of some other thing. Part of the question I think comes down to “if I change this code, will anyone else notice?” If not, go ahead and change it. If so, you need a migration plan and so it’s not as easy as saying, “oh just change it.”
I’m talking about situations where you know how to solve your problem, but you’ve chosen to implement that solution in some additional layer of code on top, rather than changing the original problematic code.
Working in Android/React-native/Expo ecosystem – 80% of my maintenance time is spent doing workarounds for build-pipeline changes that Expo folks introduces on top of Android, and and on top of React-native.
An analogy for C++ programmers:
Imagine that you use a library through a good portion of your code, and to use it you have to include several of the library’s make files (a top level make file, a library make file, and a pre-compiler/transfomer makefile).
Now imagine that every 3-4 months they change their makefiles. Every time you are up for an upgrade to leverage new features of the library (or the underlying OS platforms that the library abstracts) – you end up spending 80% of your time, trying to adjust your build process. You spend weeks on it. The stack-overflow answers refer to stuff 3-4 releases back. You cannot ‘estimate this time’. You are not adding any new features for the customers of your product.
This is what it is like using community cli-tools and Expo (at least on Android).
So you end up creating patch files, workarounds, etc just so your apps build can compile…
I think we might get further with this if we did more of a “5 Whys” approach to responding to the excuses. e.g. why are people not allowed to touch that code, or why do they think that?
I think that sometimes, rewriting a piece of code may be the best option.
I wrote a sync engine many years ago, the initial design was to download a single file inside a zip file (on PalmOS with around 100KB of heap). The design evolved over time having multiple file in the Zip archive. Then we added also an upload step, again multiple files in a zip archive.
I left that job and come back after three years. They were still using my sync engine with zip archives larger than 3MB. Sometime the download step was failing. ¯\_(ツ)_/¯
I looked back at my code and the download step was a mess as I added functionality to it with incremental design changes. The upload code was much simpler.
I spent some time debugging the original code and then decided to simply rewrote the download step with a similar design of the one used for the upload step.
The resulting code was shorter, easier to understand and very similar to the upload step.
The bug went away and they used the system for a few more years with even larger archives.
I wonder how the author of this article would react to things like the inherent brokenness of Go modules being basically unfixable by upstream for political reasons.
Minimal version selection (so you accidentally end up with CVEs in your codebase), the v2 landmine (so you have a chilling effect against bumping to major version 1), I could go on.
I disagree about whether those things are good or bad, but that aside, what does this have to do with preferring changing code to writing a workaround? The article isn’t proposing that OSS projects should just accept any patch from anyone, or if it is, the article is bad because that ain’t never gonna happen. It’s about how you treat the code you control. If you want to keep using Go dep for example, it’s certainly possible.
Digging the “prefer to…” title in place of the usual “…considered harmful”.
Learning to frame educational content in terms of the right information you’re presenting, instead of the wrong information you’re correcting, is really useful! As another example, when Julia Evans first posted this article, it was titled “DNS doesn’t propagate.” It’s now titled “DNS ‘propagation’ is actually caches expiring.” The second title is longer, but it’s clearer! Instead of saying only the wrong thing (leaving the right thing for the body of the article), you put the right thing in the article with the hope that people will want to know the detail.
It means that even people who just read the title have learned something, and it fits better with our memories because we remember titles and short concepts like that really well in general.
Workarounds considered technical debt
Agreed. This is similar to cold paths, a piece I wrote almost exactly a year ago. Workarounds add edge cases that add testing surface area, making the system more expensive to operate.
Did you submit that here? It’s good, ties crash-only software, “minimise area under ifs” and a bunch of other little ideas like that together nicely.
Edit: yes you did, discussion was at https://lobste.rs/s/1nf9gd/cold_paths
Cold paths are definitely a source of bugs, but the bigger problem in my mind is input space, i.e. the combinations of possible input data. We tend to focus on branch coverage when thinking about testing, but you can still have bugs with 100% branch coverage. That’s why testing is not equivalent to verification, where a program can be proven to work for all possible inputs and executions.
Testing is always looking at a completely infinitesimal subset of all of the possible behaviors of a program when considering all possible data states.
It really depends. Someone once said A good programmer knows when to hack and when to engineer.
An example. You have a business usage that is temporary and you can build a quick workaround.
Of course this doesn’t invalidate what’s in the article, but I don’t think it’s good to create rules/idioms/… that you blindly use in each and every situation.
Two scenarios I’ve encountered (and still encountering) at work:
We have to use a proprietary SS7 network stack. There are, to my knowledge, no open source alternatives, and even if there were, it’s too late to change now. We’re wedded to a six-figure licensed stack only certified to run on certain hardware/OS combos (SPARC, Solaris). And I’ve had to work around the code.
I would love to change the code, management doesn’t, unless there are test cases that fail now, and pass after the change. I’ve tried. I found some undefined C code that just happens to work with our compiler (but not GCC 11, which we aren’t using yet). You wouldn’t believe the dog-and-pony show to get a three line change (just moving some declarations to an outer scope) through. How do you test for undefined behavior?
I mean, isn’t that sort of thing the reason that “prefer” is used instead of “only”? And most of the rest of the article is basically saying “think hard before you conclude you can’t change the code”?
For my second case, I know how to solve the problem, but I am (or rather, was) prevented from doing so.
Right, I doubt OP would disagree with your assessment, and would mostly just be saying “If this happens a lot, maybe look for work where it happens less”?
I have no idea how complete it is, but there’s the OpenSS7 project.
Probably too late to change to it, and it only seems to support Linux, not Solaris [1]. It may be moot anyway as CDMA support is supposed to go away late this year, early next year. I can only hope.
[1] Solaris on SPARC machines was picked due to hardware requirements by the customer (the Oligarchic Cell Phone Companies).
There’s also UniversalSS7, which is Objective-C and I know of a couple of telcos using it in production. I believe it use to work on Solaris / SPARC, but it was updated to modern Objective-C a few years ago and I haven’t written the assembly code paths for SPARC or tested the clang parts (though this is a sufficiently fun project that I’d be tempted to do it if someone wanted to make a donation to a charity of my choice).
Thank you, but as I said, it’s probably too late to change it. Reasons include:
I think the OP would argue your “excuse” is valid, therefore should switch jobs to a place where you don’t have such idiotic constraints (no matter whether you’re actually happy working where you are working)
Seeing this makes me happy. I said something similar a couple of years ago.
(http://akkartik.name/akkartik-convivial-20200607.pdf)
As always, it depends. It is better all things equal if you can get a new behavior by just adding new code and leaving the old code alone. It’s the open/closed principle in SOLID. But a) adding more layers makes things slower and b) adding more layers can make it harder to tell exactly what the hell is happening. So, it’s also good to have the confidence and go back and modify the existing code to actually do the thing you want instead of some other thing. Part of the question I think comes down to “if I change this code, will anyone else notice?” If not, go ahead and change it. If so, you need a migration plan and so it’s not as easy as saying, “oh just change it.”
Working in Android/React-native/Expo ecosystem – 80% of my maintenance time is spent doing workarounds for build-pipeline changes that Expo folks introduces on top of Android, and and on top of React-native.
An analogy for C++ programmers: Imagine that you use a library through a good portion of your code, and to use it you have to include several of the library’s make files (a top level make file, a library make file, and a pre-compiler/transfomer makefile).
Now imagine that every 3-4 months they change their makefiles. Every time you are up for an upgrade to leverage new features of the library (or the underlying OS platforms that the library abstracts) – you end up spending 80% of your time, trying to adjust your build process. You spend weeks on it. The stack-overflow answers refer to stuff 3-4 releases back. You cannot ‘estimate this time’. You are not adding any new features for the customers of your product. This is what it is like using community cli-tools and Expo (at least on Android).
So you end up creating patch files, workarounds, etc just so your apps build can compile…
I think we might get further with this if we did more of a “5 Whys” approach to responding to the excuses. e.g. why are people not allowed to touch that code, or why do they think that?
Personal data point of
1
- YMMVI think that sometimes, rewriting a piece of code may be the best option.
I wrote a sync engine many years ago, the initial design was to download a single file inside a zip file (on PalmOS with around 100KB of heap). The design evolved over time having multiple file in the Zip archive. Then we added also an upload step, again multiple files in a zip archive.
I left that job and come back after three years. They were still using my sync engine with zip archives larger than 3MB. Sometime the download step was failing. ¯\_(ツ)_/¯
I looked back at my code and the download step was a mess as I added functionality to it with incremental design changes. The upload code was much simpler.
I spent some time debugging the original code and then decided to simply rewrote the download step with a similar design of the one used for the upload step.
The resulting code was shorter, easier to understand and very similar to the upload step.
The bug went away and they used the system for a few more years with even larger archives.
I wonder how the author of this article would react to things like the inherent brokenness of Go modules being basically unfixable by upstream for political reasons.
I fork Go modules all the time.
I think cadey wants to fork Go tooling itself, not Go modules.
what do you mean? Go modules seem to work fine for me. is there something specific you assume we know about?
Minimal version selection (so you accidentally end up with CVEs in your codebase), the v2 landmine (so you have a chilling effect against bumping to major version 1), I could go on.
I disagree about whether those things are good or bad, but that aside, what does this have to do with preferring changing code to writing a workaround? The article isn’t proposing that OSS projects should just accept any patch from anyone, or if it is, the article is bad because that ain’t never gonna happen. It’s about how you treat the code you control. If you want to keep using Go dep for example, it’s certainly possible.