Tangential thought: This reminds me of a thought experiment regarding the limits of software reliability. Imagine you have a printer, except it won’t work on Tuesday or Thursday. (Perhaps an errant strchr(string, ’T'); finds the wrong T.) This is no doubt highly annoying, but manageable. You get a spare inkjet for when you need to print on the bad days and make sure that your 1000 page quarterly reports are always printed on a good day.
There’s a patch which updates the printer and now it only crashes on Wednesday because the rewritten code overflows a buffer. Do you care? I mean, assuming this is an office printer used on weekday, this is a remarkable 33% improvement in reliability. My intuition is that while the patch will be a welcome change, it won’t really matter. Still need the backup printer. Still need to schedule the big prints.
Sadly, getting from “good enough” to “better enough” is a lot of work and not many people are interested.
My intuition is that while the patch will be a welcome change, it won’t really matter.
Imagine your thought experiment from the standpoint of a large company with a fleet of such printers. Likely the new patch would be considered much worse in such a scenario. Assume they had a work-around (non-software) process in place, such as people being reminded and trained to print on designated “other printers” only on T-days. Since the patch only changes the day of failure to W-days, they now have to notify and retrain people, and deal with lost productivity when people who were used to the old system forget or get confused. In such a case it would be better for them to stay with clearly more broken older version.
Indeed. And very good point. That’s what I get for changing the scenario.
It was initially hypothesized that post fix it only crashes on Tuesday, but then I provided too much detail about strchr and needed the fix to make sense, so I said Wednesday.
I could also go on and on about old bugs versus new bugs. Now I have a new hypothetical for that discussion too. :)
A thought I had reading this and yours: A lot of bugs are related to time. The reason being that time, more or less, is completely arbitrary with weird unknown rules. Implicitly people write code and, more importantly, tests assuming induction applies to it. It worked for X, it’ll work for X + 1. But time throws a huge kink in this because it’s just not true.
How to solve this? One option is to just run the same test with artificial clock for every <time granularity> that exists. That sounds pretty expensive, though. So I don’t know how to solve that.
But I do think that software needs to take time in as an explicit dependency. So much time-related code is hard to verify because it involves doing evil things that are global to the application or system. With time being an explicit dependency one can do all sorts of interested tests involving time on an individual component. Also, people might write code differently if they consider time an explicit thing rather than something that happens to them.
It seems like there would be an unending list of scenarios to cover as you get into more bizarre edge cases and combinations.
That said, I’m on board with making time explicit and testable, so at least after a weird bug is caught, a failing test can be written for it and then made to pass.
Would everyone know to write and run tests for each day of the week, leap year days & leap seconds, 2038+, time
If we were to take this approach, the testing harness would just do it.
At the very least, testing every day of a year is pretty easy, only 360sh tests.
If only it were that simple! Those 365-400 time inputs would probably need to be applied to each test (at least each feature), so the tests to run would be n*400 rather than n+400. Or am I misunderstanding something?
All that said, I think it would be interesting to have a separate set of exhaustive tests to run in addition to (and likely programmatically derived from) unit tests. It would be kind of like running 100-1,000 hand-written tests during development and then 100,000+ generated tests (time edge cases above, fuzzing, etc) during the integration process.
Anyone know of any literature on such a system? I’d be very surprised if something like this hasn’t been done, or at least talked about, before.
No you are definitely right that it’s * not +. This is where taking time in explicitly has such value, you only have to run that level of testing on the portions which you know take time as input. I think it’ll be hard to get there but definitely more correct.
This basically comes down to prop testing, so you can check out any literature or materials on quickcheck. In a world with explicit time, using prop testing becomes very straight forward and obvious.
After several weeks of frustration, where entire days devoted to experimentation had produced no results, I ended up basically adding printf statements to every single line between receiving the event from the serial port and writing it in the database…
i have increasingly come to the conclusion that this is the first rather than the last thing to do. the reward to effort ratio is enormous.
“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.”
– Brian Kernighan
Agreed, but only because I’ve never managed to get my head around debuggers. Not GDB 10+ years ago, nor Perl’s debugger after that, nor Eclipse nor IntelliJ’s debuggers. Print statements seems to be the thing for me. I can wrap my head around them. (Xcode’s debugger is the best I’ve tried, but I don’t do much Objective-C, and have never had a chance to do Swift, so it’s not much use!)