I write tests and use typed languages to avoid regressions, especially when refactoring.
A test that just fails when I refactor the internal workings of some subcomponents, is not a helpful test – it just slows me down. 99% of my tests are on the level of treating a service or part of a service as a black box. For a web service this is:
test input (request) -> [black box] -> mocked database/services
Where black box is my main code.
For NodeJS the combo express/supertest is awesome for the front bit. I wish more web frameworks in Rust etc also had this. I.e. providing ways to “fake run” requests through without having to faff around with server/sockets (and still be confident it does what it should).
(I research ways[1] to avoid that. But of course they don’t apply when you’ve already chosen a stack and framework for development. In my day job we just make hard decisions about priority and ROI and fall back sometimes to code comments, documents or oral story-telling.)
Every project is different, but ideally you can invest time in the testing infrastructure such that writing a new test is no longer annoying. I.e, maybe you can write re-usable helper functions and get to the point where a new test means adding an assertion or copy / pasting an existing test and modifying it a bit. The tools used (test harness, mocking library, etc) also play a huge role in whether tests are annoying or not, spending time ensuring you’re using the right ones (and learning how to properly use them) is another way to invest in testing.
The level of effort you should spend on testing infrastructure depends on the scope, scale and longevity of your project. There are definitely domains that will be a pain to test pretty much no matter what.
In my experience such testing frameworks tend to add to the problem, rather than solve it. Most testing frameworks I’ve seen are complex and can be tricky to work with and get things right. Especially when a test is broken it can be a pain to deal with.
Tests are hard because you essentially need to keep two functions in your head: the actual code, and the testing code. If you come back to a test after 3 years you don’t really know if the test is broken or the code is broken. It can be a real PITA if you’re using some super-clever DSL testing framework.
People trying to be “too clever” in code can lead to hard to maintain code, people trying to be “too clever” in tests often leads to hard to maintain tests.
Especially in tests I try to avoid needless abstractions and be as “dumb” as possible. I would rather copy/paste the same code 4 times (possible with some slight modifications) than write a helper function for it. It’s just such a pain to backtrack when things inevitably break.
It really doesn’t need to be this hard IMHO; you can fix much of it by letting go of the True Unit Tests™ fixation.
I don’t disagree, and I wasn’t trying to suggest using a “clever” testing framework will somehow make your tests less painful. Fwiw I even suggested the copy / paste method in my OP and use it all the time myself :p. My main point was using the right tool / methods for the job.
I will say that the right tool for the job is often the one that is the most well known for the language and domain you’re working in. Inventing a bespoke test harness and trying to force it on the 20 other developers who are already intimately familiar with the “clever” framework isn’t going to help.
I will say that the right tool for the job is often the one that is the most well known for the language and domain you’re working in. Inventing a bespoke test harness and trying to force it on the 20 other developers who are already intimately familiar with the “clever” framework isn’t going to help.
I kind of agree because there’s good value in standard tooling, but on the other hand I’ve seen rspec (the “standard tool” for Ruby/Rails testing) create more problems than solve IMHO.
When fixing testable bugs you often need that “simplest possible test case” anyway, so you can identify the bug and satisfy yourself that you fixed it. A testing framework should be so effortless that you’d want to use it as the scaffold for executing that test case as you craft the fix. From there you should only be an assert() or two away from a shippable test case.
(While the sort of code I write rarely lends itself to traditional test cases, when I do, the challenge I find is avoiding my habit of writing code defensively. I have to remind myself that I should write the most brittle test case I can, and decide how robust it needs to be if and when it ever triggers a false positive.)
This here, at the start of the second paragraphs is the greatest misconception about tests:
In order to be effective, a test needs to exist for some condition not handled by the code.
A lot of folks from the static typing and formal methods crowd treat tests as a poor man’s way of proving correctness or something… This is totally not what they’re for.
Yes, regressions are a class of bug. The unwritten inference akkartik made when saying “I don’t write tests to avoid bugs” is that it is refers specifically to writing tests to pre-empt new bugs before they can be shipped.
Such defensive use of tests is great if you’re writing code for aircraft engines or financial transactions; whereas if you’re writing a christmas tree light controller as a hobby it might be seen as somewhat obsessive compulsive.
Why does it matter particularly at what specific point in time the bugs are caught?
Because human nature.
Often times a client experiencing a bug for the first time is quite lenient and forgiving of the situation. When it’s fixed and then the exact same thing later happens again, the political and financial consequences of that are often much, much worse. People are intensely frustrated by regressions.
Sure, if we exhaustedly tested everything up front, they might never have experienced the bug in the first place, but given the very limited time and budgets on which many business and enterprise projects operate, prioritizing letting the odd new bug slip through in favor of avoiding regressions often makes a hell of a lot of sense.
Out of 1000 bugs a codebase may have, users will never see or experience 950 of them.
The 50 bugs the user hits though – you really want to make sure to write tests for them, because – based on the fact that the user hit the bug – if it breaks again, the user will immediately know.
That’s why regression tests give you a really good cost/benefit ratio.
A bug caught by a test before the bad code even lands is much easier to deal with than a bug that is caught after it has already been shipped to millions of users. In general the further along in the CI pipeline it gets caught, the more of a hassle it becomes.
The specific point in time matters because the risk-reward payoff calculus is wildly different. Avoiding coding errors (“new bugs”) by writing tests takes a lot of effort and generally only ever catches the bugs which you can predict, which can often be a small minority of actual bugs shipped. Whereas avoiding regressions (“old bugs”) by writing tests takes little to no incremental effort.
People’s opinion of test writing is usually determined by the kind of code they write. Some types of programming are not suited to any kind of automated tests. Some types of programming are all but impossible to do if you’re not writing comprehensive tests for absolutely everything.
The article says “look back after a bug is found”. That sounds like they mean bugs caught in later stages (like beta testing, or in production).
If you define bugs as faults that made it to production, then faults caught by automated tests can’t be bugs, because they wouldn’t have made it to production. It’s just semantics, automated tests catch certain problems early no matter how you call them.
I’m of the same opinion. It means that the reason why we’re writing tests is not to catch bugs in general, but specifically to catch regression bugs. With this mindset, all other catching of bugs is incidental.
Nothing I work on currently has a public api but also the majority of our api’s are undocumented beyond looking at implementation within unit tests or our webapps. This isn’t because we don’t need api documentation, it would be handy to have, just that when on-boarding new developers walking them through the unit tests has been good enough for them to go onto work on those api’s.
If we did anything public facing, API documentation would become a top priority because while most developers can infer from unit tests how things work a lot are too time constrained or too lazy to do so and will skip a tool or library with that kind of friction.
There’s a difference between documentation and unit testing. Is your documentation automatically checked? I am sure it’s not, except you’re writing doc tests.
To avoid documenting all of a function’s required output (and god forbid, side-effects) as a result of all possible input, yes. But tests cannot replace documenting the intended use and context and maybe historical perspective of a function. If the function name with the argument names and types cannot capture this sufficiently I additionally provide documentation.
No one does “no testing”. Everyone has tests. No testing is writing hello world, shipping it and then dusting off your hands and saying “another job well done!” and going home. You don’t observe it running. People who think they have no tests but they do have manual tests are using their fragile, tired, human eyes to parse the output and think “pass!”. No one does “no testing”, there’s manual testing and automated testing.
Some people won’t call them manual tests but once we identify it as manual, it is easier to see that automated tests just save you from running ./main (or whatever) for the rest of your life. You can add types, functions, tricks, anything novel in Blubb 2.0 … but you could have a business logic bug so you need to run it at least once. No one does “no testing” but they think they do. And so the explosion of permutation paths is too much to manually test, so you don’t do it. So then you regress or ship a bug. “Whoops!” And if you do find something, it’s back to manual testing. “That shouldn’t happen again!”. Nothing grows.
Tests are critical and related to the entire lifecycle. But they especially pay off late in the project. Projects rarely shrink in complexity, they just pile on the features and then you declare bankruptcy. This can happen even with testing, I just think it’s a powerful ally/tool along the way.
OP’s example of testing race conditions might resist testing. So maybe the coverage is 90%. That’s not a reason to have 0% coverage. I want to automate the things. My fragile, tired human eyes don’t want to see ./main run again.
I think a lot of people confusing “testing” with complex testing frameworks and unit testing, rather than “a program to automate running ./main in different ways”.
I like testing but care little for the whole TDD/unit testing approach. I think TDD-purists and the like have done more to turn people off from testing than anyone else, because their “one true correct way” to do things is often hard for people to implement, understand, and maintain.
However, typically when the same mind writes the code and the tests, the coverage overlaps. Errors arise from unexpected conditions, but due to their unexpected nature, these are the conditions which also go untested.
There are some bugs where I say “ok, if I understand this correctly, then this test should fail…”, add it to the testing framework, and run it, and if if fails, then I think I understand. (Did you know that if you sort countries by timezone, NZ is at the end of the list even if you sort the list the wrong way?)
There are some bugs that are much quicker to test mechanically than by spinning up a server and clicking a few buttons, so taking a few minutes to write a test right away is likely to save time while writing the fix if the fix is likely to require more than a few minutes of fixing and testing. (I have that today.)
I find many features easier to understand properly if I express the task’s invariants first, ie. if I write some tests. “What should this new feature really do?” After writing a couple of tests, I may understand the fit between the new need and the existing code better, and write simpler code. As a bonus, when I’ve written some tests I observe that I try to keep the code simple, so as to avoid the need for more tests.
The tests that naturally repeat the code, as tedu implies, are a minority for me. Tests repeat the documentation more often than the code.
write tests for things you want the program to do, to catch future regressions
fuzz the program to find things you don’t want it to do, fix them, and write regressions based on the fuzz input
Fuzzing is generally better and finding bugs than humans manually thinking of edge cases anyway. Of course you can write your own tests by hand as well, and there’s value in the specificity of thinking that requires.
On the other hand it can be difficult to write fuzz harnesses for programs which don’t parse big blobs of data. Writing a harness which calls an api or generates network calls can be a real pain.
Tests are also dependent, in a good way, on the testing platform, but this requires they be run on the affected platform as well. A test that checks a function is endian neutral will fail to catch mistakes if it’s only ever run on little endian systems.
This is something I dealt with a lot in the real world, FWIW. Usually, the endianness tests (if they even exist, and if the software is portable) only get tested if IBM takes an interest in your project and provides CI for their systems.
Tests are very brittle, breaking due to entirely innocuous changes in the code because they inadvertently embed very specific implementation dependencies.
True, and I hate when that happens. But there are ways to write code that keeps it more testable than this, like dependency inversion. If you test as you build, that can happen naturally.
The author’s point about doubling the number of tests to find bugs sounds fair too, but I think that means the test strategy is unfocused. There is a frequent question among my team, “Is this test high value?” That is, is it worth writing / keeping? We want the tests that give us confidence in the product; that’s the point. We don’t need perfect branch coverage. Coverage is just a heuristic.
One way to boil this down is, test what will scare you if it breaks, and test for bugs that might regress. But I struggle with this.
Regarding tests not addressing every architecture while running on your dev machine: That’s a fact of life, sometimes mitigated by your language runtime. Tests don’t deliver confidence by being files in the repo. They must be run, where they need to run, to count for anything. My tests would be much less helpful without continuous integration.
Reaaaally weird article in my eyes. I just cannot shake the feeling like if the author never ever had any positive outcome out of their tests. [Or did they just take a deliberately fake rhetoric stance?]
Personally, the first thing I want to say to that is, that I totally had a situation more than once in my life, where I wrote some code, felt happy it’s so smart and great and perfect, then grudgingly wrote some tests, then lo and behold, immediately saw them find some dumb mistakes.
Secondly, I do have a couple hobby open-source projects I wrote, where I didn’t have stamina to write tests. I fully accept that it had to be like that (they took a lot of energy to get to completion anyway). But I’m also now at this unpleasant place, where I’m afraid to improve them and take them further, paralyzed by the thought that any change I do may break something. I feel I really need to add quite a lot of tests before I can go further, to give me some peace of mind and a feeling of safety. Though I also still can’t muster stamina to do that. Hopefully I’ll be able to break out of this place at some point.
Thirdly, in another open-source project, I did write some tests as I was going, especially a few kinda end-to-end “spec” tests. And I’m very happy I did that, they gave me some targets to work towards, and I think a few times they detected some regressions I accidentally introduced while adding code, helping me to fix them early, and somewhat limiting the amount of code I had to browse to debug them.
I mean, I totally don’t agree with many absolutist ideas/rules like “the more tests the merrier”, or “always ever write tests before code”, or “100% coverage or die”, etc. etc. etc. (Maybe the only “hard” rule I kinda try to stick to recently is “if I can’t [yet] write the test reasonably well [esp. non-flaky, non-time-dependent, and black-box], better not write it at all [until I, or someone else, invent some smarter way to do it]”). And I still do find writing tests not only annoying, but also hard and tricky. (In too many ways to quickly list here.) In other words, I don’t see tests as panaceum. But I just do personally often see value in them, see them as helpful. In other words again, I see them as one more tool in my toolbelt, that has its ways and times and places of being useful, as well as of being counter-productive.
PS. Ah, yes, one more thing: I did a few ports of some small open-source projects to different languages. A test-suite was basically a must-have prerequisite before I started each of them, and each time was ab-so-lu-te-ly indispensable.
PPS. I would guess @andyc probably could have an opinion or two on the topic of whether tests are useful to them when writing their shell.
Other tests grow obsolete because the target platform is retired, but not the test. There’s even less pressure to remove stale tests than useless code. Watch as the workaround for Windows XP is removed from the code, but not the test checking it still works.
I don’t see how that works? Or how that is possible?
Would not that test fail once its code is removed?
That failure would be momentarily inconvenient for whoever is removing it, sure, but when analyzed the test would be removed as unnecessary, no?
On the other hand, without tests, you might not even know the workaround is still lingering, long after support for Windows XP is gone. A constant somewhere that lists supported platforms, and simple test that asserts that constant includes “Windows XP” would fail when that is removed from official support.
Tests can help with removing unused (or no-longer needed) workarounds. In our test suite (at $WORK) we have what we call “sanity tests”, which have a few categories. One category is simply for workarounds, and does things such as assert “Is the version of library foo equal to 4.5.1 or higher?” and has a failure message saying something such as “A workaround in path/to/file is no longer needed. Please go remove it by {{instructions to remove}}.”
Beyond that, there are many ways to make test suites useful as they age. We have a dozen or so tests that have comments (or failure messages) such as “If this test fails, take a moment and consider removing it.” because it might have been testing an expected behavior that the author suspected (knew?) might become obsolete at some point.
The tests continues to work because it would fail on Windows XP, and the code is removed just after Windows XP is removed from the set of testing platforms.
However. Bugs that prevent users from working are really bad. Bugs that irritate users are also bad, but not quite that bad. Bugs that block CI etc. and thereby block the development team are even less bad. Bugs that block a single developer are even less bad. Bugs that slow down unit testing by a millisecond are at the very end of the scale. Give me a hundred of those and relieve me of a small chance of the first kind, and I’ll still think the swap is a good one for me.
I tend to find a bug for every test I write. I prefer having a peace of mind that if my tests pass, at least all the core functionality I’ve got tested still works.
But I’m sympathetic to not testing race conditions. There are too many of ways to do it poorly, making tests that are either flaky or don’t give much assurance. It may be more effective to eliminate races by construction (e.g. avoid shared mutable state, make things idempotent) and where that is impossible, focus on fuzzing instead.
Im against writing tests for the sake of it. My normal workflow will be to identify a new piece of functionality and then write a simple test that maybe hits a new api end point and provides various data - the purpose of this isn’t to actually test the implementation but instead provide me with an easy to use and quick to repeat execution of that end point with known input. I can then throw some break points in, and build out functionality while running the end point upwards of 100 times an hour… essentially I write tests as scriptable Postman that can then be used to avoid regressions in that functionality later on.
Being against testing is a bit like being against automation. But automation is why we write software in the first place. The first application for software was to automate calculations for us and this is the general trend of the industry.
I think a more pertinent question is whenever the automation cost, the time it takes to write and run the test, will pay off. Obviously it’s really hard to predict the future and some level of disagreement can come from different interpolation models, but this is really the issue at hand.
So I agree, writing tests blindly, just for the sake of writing tests, it not worth it. It makes the code-base larger and harder to change in some regard. It also potentially slows down iteration speed. But no test is also not the right approach. The right approach is somewhere in the middle, as usual.
Consider all the times that the program is executed manually to test, manually, that everything works as expected. This is something that can be automated. And by doing that, it also helps increase the build & test iteration speed. Software is a force multiplier, but only when applied properly.
Errors arise from me not being a computer and not being able to predict with sufficient accuracy what code will do. I write tests simply to show that the code indeed does what it should do. My tests usually don’t pass on the first run, because the code is wrong.
I don’t write tests to avoid bugs. I write tests to avoid regressions.
Exactly. I dislike writing tests, but I dislike fixing regressions even more.
And i’d go even further:
I write tests and use typed languages to avoid regressions, especially when refactoring.
A test that just fails when I refactor the internal workings of some subcomponents, is not a helpful test – it just slows me down. 99% of my tests are on the level of treating a service or part of a service as a black box. For a web service this is:
Where
black box
is my main code.For NodeJS the combo express/supertest is awesome for the front bit. I wish more web frameworks in Rust etc also had this. I.e. providing ways to “fake run” requests through without having to faff around with server/sockets (and still be confident it does what it should).
Now the impish question: what is the correct decision if the test is more annoying to write than the regression is to observe and fix?
Indeed!
(I research ways[1] to avoid that. But of course they don’t apply when you’ve already chosen a stack and framework for development. In my day job we just make hard decisions about priority and ROI and fall back sometimes to code comments, documents or oral story-telling.)
[1] https://github.com/akkartik/mu1#readme (first section)
Every project is different, but ideally you can invest time in the testing infrastructure such that writing a new test is no longer annoying. I.e, maybe you can write re-usable helper functions and get to the point where a new test means adding an assertion or copy / pasting an existing test and modifying it a bit. The tools used (test harness, mocking library, etc) also play a huge role in whether tests are annoying or not, spending time ensuring you’re using the right ones (and learning how to properly use them) is another way to invest in testing.
The level of effort you should spend on testing infrastructure depends on the scope, scale and longevity of your project. There are definitely domains that will be a pain to test pretty much no matter what.
In my experience such testing frameworks tend to add to the problem, rather than solve it. Most testing frameworks I’ve seen are complex and can be tricky to work with and get things right. Especially when a test is broken it can be a pain to deal with.
Tests are hard because you essentially need to keep two functions in your head: the actual code, and the testing code. If you come back to a test after 3 years you don’t really know if the test is broken or the code is broken. It can be a real PITA if you’re using some super-clever DSL testing framework.
People trying to be “too clever” in code can lead to hard to maintain code, people trying to be “too clever” in tests often leads to hard to maintain tests.
Especially in tests I try to avoid needless abstractions and be as “dumb” as possible. I would rather copy/paste the same code 4 times (possible with some slight modifications) than write a helper function for it. It’s just such a pain to backtrack when things inevitably break.
It really doesn’t need to be this hard IMHO; you can fix much of it by letting go of the True Unit Tests™ fixation.
I don’t disagree, and I wasn’t trying to suggest using a “clever” testing framework will somehow make your tests less painful. Fwiw I even suggested the copy / paste method in my OP and use it all the time myself :p. My main point was using the right tool / methods for the job.
I will say that the right tool for the job is often the one that is the most well known for the language and domain you’re working in. Inventing a bespoke test harness and trying to force it on the 20 other developers who are already intimately familiar with the “clever” framework isn’t going to help.
Fair enough :-)
I kind of agree because there’s good value in standard tooling, but on the other hand I’ve seen rspec (the “standard tool” for Ruby/Rails testing) create more problems than solve IMHO.
When fixing testable bugs you often need that “simplest possible test case” anyway, so you can identify the bug and satisfy yourself that you fixed it. A testing framework should be so effortless that you’d want to use it as the scaffold for executing that test case as you craft the fix. From there you should only be an
assert()
or two away from a shippable test case.(While the sort of code I write rarely lends itself to traditional test cases, when I do, the challenge I find is avoiding my habit of writing code defensively. I have to remind myself that I should write the most brittle test case I can, and decide how robust it needs to be if and when it ever triggers a false positive.)
+1
This here, at the start of the second paragraphs is the greatest misconception about tests:
A lot of folks from the static typing and formal methods crowd treat tests as a poor man’s way of proving correctness or something… This is totally not what they’re for.
umm…..aren’t regressions bugs?
Yes, regressions are a class of bug. The unwritten inference akkartik made when saying “I don’t write tests to avoid bugs” is that it is refers specifically to writing tests to pre-empt new bugs before they can be shipped.
Such defensive use of tests is great if you’re writing code for aircraft engines or financial transactions; whereas if you’re writing a christmas tree light controller as a hobby it might be seen as somewhat obsessive compulsive.
I-I don’t understand. Tests are there to catch bugs. Why does it matter particularly at what specific point in time the bugs are caught?
Because human nature.
Often times a client experiencing a bug for the first time is quite lenient and forgiving of the situation. When it’s fixed and then the exact same thing later happens again, the political and financial consequences of that are often much, much worse. People are intensely frustrated by regressions.
Sure, if we exhaustedly tested everything up front, they might never have experienced the bug in the first place, but given the very limited time and budgets on which many business and enterprise projects operate, prioritizing letting the odd new bug slip through in favor of avoiding regressions often makes a hell of a lot of sense.
Not sure if you are trolling …
Out of 1000 bugs a codebase may have, users will never see or experience 950 of them.
The 50 bugs the user hits though – you really want to make sure to write tests for them, because – based on the fact that the user hit the bug – if it breaks again, the user will immediately know.
That’s why regression tests give you a really good cost/benefit ratio.
A bug caught by a test before the bad code even lands is much easier to deal with than a bug that is caught after it has already been shipped to millions of users. In general the further along in the CI pipeline it gets caught, the more of a hassle it becomes.
The specific point in time matters because the risk-reward payoff calculus is wildly different. Avoiding coding errors (“new bugs”) by writing tests takes a lot of effort and generally only ever catches the bugs which you can predict, which can often be a small minority of actual bugs shipped. Whereas avoiding regressions (“old bugs”) by writing tests takes little to no incremental effort.
People’s opinion of test writing is usually determined by the kind of code they write. Some types of programming are not suited to any kind of automated tests. Some types of programming are all but impossible to do if you’re not writing comprehensive tests for absolutely everything.
The whole class of regression tests was omitted from the original article which is why it’s relevant to bring them up here.
The article says “look back after a bug is found”. That sounds like they mean bugs caught in later stages (like beta testing, or in production).
If you define bugs as faults that made it to production, then faults caught by automated tests can’t be bugs, because they wouldn’t have made it to production. It’s just semantics, automated tests catch certain problems early no matter how you call them.
I’m of the same opinion. It means that the reason why we’re writing tests is not to catch bugs in general, but specifically to catch regression bugs. With this mindset, all other catching of bugs is incidental.
I don’t write tests to avoid bugs. I write tests to avoid memorizing the complicated functionality I need to implement.
so you write tests to avoid documentation?
I can’t speak for op, but where I work its normal to rely upon our tests as reliable documentation on how our api should be integrated with.
I could see how one ‘documents’ the workings of unexposed subsystems, but using tests in lieu of API documentation doesn’t sound like a great idea.
Granted we never can talk in absolutes; there are certainly projects out there which have neither tests nor documentation and are doing OK.
Nothing I work on currently has a public api but also the majority of our api’s are undocumented beyond looking at implementation within unit tests or our webapps. This isn’t because we don’t need api documentation, it would be handy to have, just that when on-boarding new developers walking them through the unit tests has been good enough for them to go onto work on those api’s.
If we did anything public facing, API documentation would become a top priority because while most developers can infer from unit tests how things work a lot are too time constrained or too lazy to do so and will skip a tool or library with that kind of friction.
There’s a difference between documentation and unit testing. Is your documentation automatically checked? I am sure it’s not, except you’re writing doc tests.
It helps to read my question in the context of the GP’s comment.
To avoid documenting all of a function’s required output (and god forbid, side-effects) as a result of all possible input, yes. But tests cannot replace documenting the intended use and context and maybe historical perspective of a function. If the function name with the argument names and types cannot capture this sufficiently I additionally provide documentation.
Fair enough, if that works.
No one does “no testing”. Everyone has tests. No testing is writing hello world, shipping it and then dusting off your hands and saying “another job well done!” and going home. You don’t observe it running. People who think they have no tests but they do have manual tests are using their fragile, tired, human eyes to parse the output and think “pass!”. No one does “no testing”, there’s manual testing and automated testing.
Some people won’t call them manual tests but once we identify it as manual, it is easier to see that automated tests just save you from running
./main
(or whatever) for the rest of your life. You can add types, functions, tricks, anything novel in Blubb 2.0 … but you could have a business logic bug so you need to run it at least once. No one does “no testing” but they think they do. And so the explosion of permutation paths is too much to manually test, so you don’t do it. So then you regress or ship a bug. “Whoops!” And if you do find something, it’s back to manual testing. “That shouldn’t happen again!”. Nothing grows.Tests are critical and related to the entire lifecycle. But they especially pay off late in the project. Projects rarely shrink in complexity, they just pile on the features and then you declare bankruptcy. This can happen even with testing, I just think it’s a powerful ally/tool along the way.
OP’s example of testing race conditions might resist testing. So maybe the coverage is 90%. That’s not a reason to have 0% coverage. I want to automate the things. My fragile, tired human eyes don’t want to see
./main
run again.I think a lot of people confusing “testing” with complex testing frameworks and unit testing, rather than “a program to automate running ./main in different ways”.
I like testing but care little for the whole TDD/unit testing approach. I think TDD-purists and the like have done more to turn people off from testing than anyone else, because their “one true correct way” to do things is often hard for people to implement, understand, and maintain.
I think this is what exactly what property testing tries to address. There’s a good 3 part series that was posted to Lobsters about using property tests in a screencast editor. This is the first one: https://wickstrom.tech/programming/2019/03/24/property-based-testing-in-a-screencast-editor-case-study-1.html
There are so many kinds of bugs…
There are some bugs where I say “ok, if I understand this correctly, then this test should fail…”, add it to the testing framework, and run it, and if if fails, then I think I understand. (Did you know that if you sort countries by timezone, NZ is at the end of the list even if you sort the list the wrong way?)
There are some bugs that are much quicker to test mechanically than by spinning up a server and clicking a few buttons, so taking a few minutes to write a test right away is likely to save time while writing the fix if the fix is likely to require more than a few minutes of fixing and testing. (I have that today.)
I find many features easier to understand properly if I express the task’s invariants first, ie. if I write some tests. “What should this new feature really do?” After writing a couple of tests, I may understand the fit between the new need and the existing code better, and write simpler code. As a bonus, when I’ve written some tests I observe that I try to keep the code simple, so as to avoid the need for more tests.
The tests that naturally repeat the code, as tedu implies, are a minority for me. Tests repeat the documentation more often than the code.
One approach:
Fuzzing is generally better and finding bugs than humans manually thinking of edge cases anyway. Of course you can write your own tests by hand as well, and there’s value in the specificity of thinking that requires.
On the other hand it can be difficult to write fuzz harnesses for programs which don’t parse big blobs of data. Writing a harness which calls an api or generates network calls can be a real pain.
This is something I dealt with a lot in the real world, FWIW. Usually, the endianness tests (if they even exist, and if the software is portable) only get tested if IBM takes an interest in your project and provides CI for their systems.
You should’ve started with this. :)
True, and I hate when that happens. But there are ways to write code that keeps it more testable than this, like dependency inversion. If you test as you build, that can happen naturally.
The author’s point about doubling the number of tests to find bugs sounds fair too, but I think that means the test strategy is unfocused. There is a frequent question among my team, “Is this test high value?” That is, is it worth writing / keeping? We want the tests that give us confidence in the product; that’s the point. We don’t need perfect branch coverage. Coverage is just a heuristic.
One way to boil this down is, test what will scare you if it breaks, and test for bugs that might regress. But I struggle with this.
Regarding tests not addressing every architecture while running on your dev machine: That’s a fact of life, sometimes mitigated by your language runtime. Tests don’t deliver confidence by being files in the repo. They must be run, where they need to run, to count for anything. My tests would be much less helpful without continuous integration.
Reaaaally weird article in my eyes. I just cannot shake the feeling like if the author never ever had any positive outcome out of their tests. [Or did they just take a deliberately fake rhetoric stance?]
Personally, the first thing I want to say to that is, that I totally had a situation more than once in my life, where I wrote some code, felt happy it’s so smart and great and perfect, then grudgingly wrote some tests, then lo and behold, immediately saw them find some dumb mistakes.
Secondly, I do have a couple hobby open-source projects I wrote, where I didn’t have stamina to write tests. I fully accept that it had to be like that (they took a lot of energy to get to completion anyway). But I’m also now at this unpleasant place, where I’m afraid to improve them and take them further, paralyzed by the thought that any change I do may break something. I feel I really need to add quite a lot of tests before I can go further, to give me some peace of mind and a feeling of safety. Though I also still can’t muster stamina to do that. Hopefully I’ll be able to break out of this place at some point.
Thirdly, in another open-source project, I did write some tests as I was going, especially a few kinda end-to-end “spec” tests. And I’m very happy I did that, they gave me some targets to work towards, and I think a few times they detected some regressions I accidentally introduced while adding code, helping me to fix them early, and somewhat limiting the amount of code I had to browse to debug them.
I mean, I totally don’t agree with many absolutist ideas/rules like “the more tests the merrier”, or “always ever write tests before code”, or “100% coverage or die”, etc. etc. etc. (Maybe the only “hard” rule I kinda try to stick to recently is “if I can’t [yet] write the test reasonably well [esp. non-flaky, non-time-dependent, and black-box], better not write it at all [until I, or someone else, invent some smarter way to do it]”). And I still do find writing tests not only annoying, but also hard and tricky. (In too many ways to quickly list here.) In other words, I don’t see tests as panaceum. But I just do personally often see value in them, see them as helpful. In other words again, I see them as one more tool in my toolbelt, that has its ways and times and places of being useful, as well as of being counter-productive.
PS. Ah, yes, one more thing: I did a few ports of some small open-source projects to different languages. A test-suite was basically a must-have prerequisite before I started each of them, and each time was ab-so-lu-te-ly indispensable.
PPS. I would guess @andyc probably could have an opinion or two on the topic of whether tests are useful to them when writing their shell.
PPPS. I dunno, is the OP just intended to be provocative, so as to force people to write their testimonials/argue to the contrary?
I don’t see how that works? Or how that is possible?
Would not that test fail once its code is removed?
That failure would be momentarily inconvenient for whoever is removing it, sure, but when analyzed the test would be removed as unnecessary, no?
On the other hand, without tests, you might not even know the workaround is still lingering, long after support for Windows XP is gone. A constant somewhere that lists supported platforms, and simple test that asserts that constant includes “Windows XP” would fail when that is removed from official support.
Tests can help with removing unused (or no-longer needed) workarounds. In our test suite (at
$WORK
) we have what we call “sanity tests”, which have a few categories. One category is simply for workarounds, and does things such as assert “Is the version of libraryfoo
equal to4.5.1
or higher?” and has a failure message saying something such as “A workaround inpath/to/file
is no longer needed. Please go remove it by {{instructions to remove}}.”Beyond that, there are many ways to make test suites useful as they age. We have a dozen or so tests that have comments (or failure messages) such as “If this test fails, take a moment and consider removing it.” because it might have been testing an expected behavior that the author suspected (knew?) might become obsolete at some point.
The tests continues to work because it would fail on Windows XP, and the code is removed just after Windows XP is removed from the set of testing platforms.
However. Bugs that prevent users from working are really bad. Bugs that irritate users are also bad, but not quite that bad. Bugs that block CI etc. and thereby block the development team are even less bad. Bugs that block a single developer are even less bad. Bugs that slow down unit testing by a millisecond are at the very end of the scale. Give me a hundred of those and relieve me of a small chance of the first kind, and I’ll still think the swap is a good one for me.
Aha, I misinterpreted an aspect of the example. Thanks.
I tend to find a bug for every test I write. I prefer having a peace of mind that if my tests pass, at least all the core functionality I’ve got tested still works.
But I’m sympathetic to not testing race conditions. There are too many of ways to do it poorly, making tests that are either flaky or don’t give much assurance. It may be more effective to eliminate races by construction (e.g. avoid shared mutable state, make things idempotent) and where that is impossible, focus on fuzzing instead.
Im against writing tests for the sake of it. My normal workflow will be to identify a new piece of functionality and then write a simple test that maybe hits a new api end point and provides various data - the purpose of this isn’t to actually test the implementation but instead provide me with an easy to use and quick to repeat execution of that end point with known input. I can then throw some break points in, and build out functionality while running the end point upwards of 100 times an hour… essentially I write tests as scriptable Postman that can then be used to avoid regressions in that functionality later on.
One additional reason to write tests is you’re not alone in the world.
“Look at this ugly method that my teammate wrote six months ago, let’s reactor it…” - BOOM - “Nope, it was just fine, lemme revert that.”
Or “lemme swap this old dependency for the new one” - Boom - “Oh yeah right now I also want to change that other thing.”
You get extra bonus if the teammate in question is you from the past.
Being against testing is a bit like being against automation. But automation is why we write software in the first place. The first application for software was to automate calculations for us and this is the general trend of the industry.
I think a more pertinent question is whenever the automation cost, the time it takes to write and run the test, will pay off. Obviously it’s really hard to predict the future and some level of disagreement can come from different interpolation models, but this is really the issue at hand.
So I agree, writing tests blindly, just for the sake of writing tests, it not worth it. It makes the code-base larger and harder to change in some regard. It also potentially slows down iteration speed. But no test is also not the right approach. The right approach is somewhere in the middle, as usual.
Consider all the times that the program is executed manually to test, manually, that everything works as expected. This is something that can be automated. And by doing that, it also helps increase the build & test iteration speed. Software is a force multiplier, but only when applied properly.
Errors arise from me not being a computer and not being able to predict with sufficient accuracy what code will do. I write tests simply to show that the code indeed does what it should do. My tests usually don’t pass on the first run, because the code is wrong.
To continue the trend…
I don’t write tests to avoid bugs, I write tests to know when a feature is complete.