Unit test “purity” discussions are the biggest waste of time in the known universe. Less I/O and side effects is better, because it makes the tests faster and less flaky, but consistently repeatable I/O is absolutely fine. The goal is to not have to do shit manually, period.
Anyone engaging in unit test purity discussions needs to be shushed immediately. Any teams I’m part of, I shut it down, kindly, if possible, but it must go away, or the bike shedding never ends.
I mostly agree with the other top comment. OP seems to be going after a very specific discourse about testing, which is kinda dumb and misplaced, and a straw man of it, on top. I’ll discuss how to write useful tests anytime, but this ain’t that discussion.
Only one “non-numeric” test should be necessary, because there should only be a single integer parser function.
The tests all assume some arbitrary production data, which is a terrible idea. Each test should run on the most trivial possible data set which proves the point, so that each test can be understood in isolation.
On a related note, why would “before the start year” be a failure case? It’s not the same class of error as passing foo/12/04.1, there’s just no match.
The passing tests contain mostly irrelevant data, presumably copied verbatim from the underlying data set. For the “first entry” test it looks like only start and stop are relevant, and for “requesting a file” filename is also relevant. But with a reasonable schema you’d insert maybe two entries per test (using default or even random values for any field which is irrelevant for the test), and assert that the filter retrieves the entry with the relevant ID.
It looks like the test framework needs a lot of TLC to be comparable with the most popular frameworks in, say Python, Java, Ruby or Rust.
Fair enough, but I thought the point of testing was that programmers aren’t capable of writing bug free code (not my thought, but that’s what comes across to me), so the more tests, the better? [1]
Not really. I didn’t show, but the starting and ending dates are hard coded in the test file.
Because I need to handle that case and return false. This is for a blogging engine, and the fact that you specify a date before any entries exist will cause a 404. I also return 404 for requests that can’t be parsed. And again, this is a “unit test”. What should and shouldn’t I be testing here? That’s the entire point of my series, that “unit test” proponents can’t get their act together and come up with a coherent methodology for testing.
It’s not just for a single entry or file. Here are two more test cases:
(cont)Note that both of these request a range of entries, and the purpose of this “unit test” is to test the request parser, not the underlying blog engine to see if it can return these entries. Also, there’s a difference between a request of “2008/7-10” and “2008/07/10”—the former will have the redirect flag set true, the other false.
This isn’t Python, Java, Ruby or Rust, but C. Also, I wanted a simple testing rig—the others I found for C tended to be too verbose and clunky for my tastes, but hey, I might be in the minority here.
[1] At my former job, I recall spending two full days (8 hours each) in a room full of engineers trying to eek out the minimum set of tests for a project we were working on (and then three work weeks writing out the “test plan” that once written, NO ONE ever referred to again, just because that’s what was expected). So what is it? Minimal tests? Maximum tests? 100% code coverage?
Sorry if it feels like I’m yelling at you, but I’m trying to figure out what the hell unit tests are, and now you’re telling me I did too many, and I didn’t use a proper framework.
Testing is the hardest thing I’ve ever learned - I’m very much still learning after 19 years as a programmer. I didn’t learn to appreciate TDD before actually working with someone who had been part of an extremely successful team at an earlier job, developing a somewhat famous system. So I wouldn’t be too frustrated about finding it confusing. Learning it on your own is a bit like learning woodturning by building your own lathe.
That said, the actual definition of a unit is one of the least interesting things about testing. It’s only interesting insofar as it allows you to have confidence in that the code does what it should be doing. That confidence then allows fearless refactoring, which means that every time you learn some way to improve any part of the code you can apply it everywhere without worrying about breaking anything. Some field is no longer needed? Simply snip it everywhere in your production code, run the tests, verify that the tests that should be dealing with that field fail (otherwise the tests are probably misnamed, overtesting, or defective), verify that no other tests fail (ditto), then update the tests. Similar with adding features - if any existing tests fail, make sure to understand why before continuing. Unfortunately it would take many weeks of blogging to try to explain this in detail, this post is already huge, and others have probably explained it better already.
Spending a bunch of time ahead of implementation trying to tease out the tests is just waterfall in action, and a terrible idea. While developing you’re bound to come up with a bunch more tests, and you’ll probably find that a bunch of the tests which you came up with ahead of time are redundant. That is, in TDD terms, adding those tests makes nothing fail in the implementation so far.
On a related note, if in doubt whether you need another test for a piece of code, try mutation testing. Nobody had told me about this side effect of it, but when you run mutation tests you’re basically checking the completeness of your test suite. If any of the mutations survive you might have a hole in your tests. But conversely, if more than one test kill a mutation then one of those tests might be redundant (or testing too much).
And yes, programmers (myself included) definitely aren’t capable of writing bug-free code. Tests (unit, integration, acceptance, performance, load, mutation, and so on) are a huge field, but a small part of the toolkit you can use to reduce the chance and severity of bugs.
You say you appreciate TDD, which I know as “Test Driven Design,” that is, you write your tests first to drive the design of the code. Then you say:
Spending a bunch of time ahead of implementation trying to tease out the tests is just waterfall in action, and a terrible idea.
So is my understanding of TDD incorrect? Or did you contradict yourself?
the actual definition of a unit is one of the least interesting things about testing.
And I’ve read scores of blog posts that state otherwise, hence my confusion here.
Similar with adding features - if any existing tests fail, make sure to understand why before continuing. Unfortunately it would take many weeks of blogging to try to explain this in detail, this post is already huge, and others have probably explained it better already.
You post isn’t huge, and no, others haven’t explained it better, else I wouldn’t be bitching about it.
Some field is no longer needed? Simply snip it everywhere in your production code, run the tests, verify that the tests that should be dealing with that field fail
So is compilation a form of testing? Because some have said so, yet I don’t agree with that—if you can’t compile the code, you can’t compile the tests (because that’s the language I work in—C), so it’s tough to run the tests.
You say you appreciate TDD, which I know as “Test Driven Design,” that is, you write your tests first to drive the design of the code.
The crucial error here is “tests”. TDD is one test at a time, each test moving the code towards some goal, each commit including some code change and (if necessary) the relevant, usually single, test which demonstrates that the code does something useful and new.
And I’ve read scores of blog posts that state otherwise, hence my confusion here.
I can’t answer for what others find interesting, but I consider the actual test techniques vastly more interesting than the definition.
Some field is no longer needed? Simply snip it everywhere in your production code, run the tests, verify that the tests that should be dealing with that field fail
So is compilation a form of testing? Because some have said so, yet I don’t agree with that—if you can’t compile the code, you can’t compile the tests (because that’s the language I work in—C), so it’s tough to run the tests.
This is where the discussion often gets into what I consider boring definitions. The fact that your code compiles is obviously a useful “test” in that it increases (some would say is required for) confidence that the code will work in production. It’s also a “test” in the sense that you could at least in principle write other tests which overlap with the work the compiler is doing for you. :shrug: :)
First, to op: I think you’re doing great, and on the “right” track.
Fwiw I’ve written rather similar “let a hundred sledgehammers hammer” [m] type tests when I had the great fortune to receive a project without any tests - as a first step to be able to refactor/fix bugs without introducing more than one bug per bug i fixed - and on occasion when working with a particularly narly and under-documented API (your custom SOAP API running on php is sensitive to element order, and wants it’s arguments in xml in a data-element, inside xml? OK, let me just hammer out what dark spells I want my code to output (hello, manual post via curl) - and record what horrors I receive back - record that in a few test cases and force my code to summon conforming monsters from the database).
The main thing is to write tests that give some value to your project. I often thought “simpler” tests didn’t (“clearly this route always return 200 ok!” Well, not if some middleware that was added five years ago by someone else is suddenly missing a dependency because we’re running on a supported version of Debian now, not some t’en year old rubbish..).
But the more importantly:
You say you appreciate TDD, which I know as “Test Driven Design,” that is, you write your tests first to drive the design of the code. Then you say:
Spending a bunch of time ahead of implementation trying to tease out the tests is just waterfall in action, and a terrible idea.
So is my understanding of TDD incorrect? Or did you contradict yourself?
Yes, no they didn’t. Strict test driven design is quite simple:
You have a new feature (list blog posts)
Write the smallest test that fails (“red” test)
In this case, with a green field app - that might be: GET /posts and expect to receive a json object containing two posts
Write the least amount of code that makes test pass (“green test”)
Eg: ‘return { “post”: {}, “post”: {}}’
Refactor. Keep test green - but eg get post data from test database / test fixtures.
Goto 0
In this case, maybe add filtering on year - expect 2 posts for 2023, 1 for 2021, none for 2020.
Now, the code to return a static dummy post array might seem absurdly trivial - and it would be absurd to start there when you already have production code.
But notice that if you start with TDD, all your code will be amendable to test, and you will build up an always up-to-date set of test data/fixtures. When you add an “updated at”-field to your post, it goes into a test first.
As for the unit vs integration test - TDD by itself doesn’t really “give” you integration tests as such. But that doesn’t mean you don’t need them! Unit tests (try to) certify that the parts do what it says on the tin (potato soup, starter). Integration tests (try to) certify that the sum of your tested parts combine to do what it says on the box (full meal inside).
At one extreme TDD will give you a lot of throwaway tests - but leave you with a few useful ones, and with code that is modular, easy to test, easy to instrument. At another TDD can leave you with 100% test coverage and a solid set of test data.
Unit tests can combine with integration tests, when you encounter new data “in the wild” that breaks something. Maybe you end up adding a test for your input validation - catching another edgecase - but first you add the invalid data and watch how listing posts break when some field is NULL or half a valid utf8 or something.
the point of testing was that programmers aren’t capable of writing bug free code
Nope.
so the more tests, the better?
Big nope. You need just enough tests to force you to write the production code.
One test is usually not enough, because then the “production code” can be “just return the hard-coded result the test wants”. And in fact doing that (hard-coding the response) is actually a good idea. Yes, sounds strange, but is true. Why? Because it counteracts the super strong tendency of programmers to overgeneralise. “I know how to do this!!”
With a second test case, just returning a hard-coded result no longer works. In order to make it work again, you’d either have to start testing the parameter values (if/switch) in order to figure out which hard-coded value to return. Or you can implement the actual computation.
Only one “non-numeric” test should be necessary, because there should only be a single integer parser function.
Probably a fair point here, but how do you know what the exact minimal data is that exposes a bug in the general case? That’s not possible, right? So we account for that by adding more test cases that might uncover an unknown unknown of ours. We don’t optimize test cases down to a minimum covering set, because we don’t know what that set is. We try and make test suites more robust by adding “multiple coats” of different data combinations.
[H]ow do you know what the exact minimal data is that exposes a bug in the general case?
Experience
Fuzz/mutation/property based testing
Consider that there is always a trade-off. Every single test you write comes with one-off and repeated costs (writing it, maintaining it, running it), meaning you really want to know if a test is not adding any value. So it’s worth spending a bit of time making sure every test is adding actual value.
I agree with that. For me, that’s why I focus on generative testing, and supplement it with targeted test cases that I know are really important. I think we undersell how hard it is to come up with good test cases.
[knowing exact tests to expose bugs] That’s not possible, right?
Exactly. Fortunately, it is also not usually necessary. Both testing and static typing are theoretically ludicrously inadequate. However, both practically do a fairly good job. Why? Because most bugs are stupid. They are so stupid that we can’t see them (because we are looking for sophisticated errors), and so stupid that exposing them to even a minimal amount of reality will expose them.
There was a paper a while ago that talked about how even extremely low levels of code coverage were adequate for uncovering most bugs, and raising the level beyond didn’t help much.
Unfortunately I take a lot of these studies with a huge grain of salt. The linked paper here tests databases for example. It’s very rare for studies like this to be done on business applications, which have a lot more functionality.
I agree anecdotally that “simple” testing gets you quite far. Probably 80% or so. The issue is that correctness is binary - if one piece of functionality has a bug, users notice it and can’t get their job done.
I also thing a large chunk of bugs are simply a failure to express requirements correctly, I.e. are misspecifications. This is something that testing doesn’t always directly help with.
What you’re describing is heading towards fuzzing or property based testing.
It’s not usually possible to be sure that we have covered all possible inputs when writing tests like these by hand. Taking a wild stab at ideas that might possibly break the code is just going to leave you with a lot more tests that don’t have any logic behind their existence.
If I don’t know what the minimum set of tests to cover a function is, I decompose the function and write tests that cover the bounds of inputs to its simpler components first. I then use simple substitutes for its components so that I can write simple tests for the original function.
I say this, but I very rarely write unit tests these days. Usually only when I’m writing some very tricky code, or something ‘simple’ I wrote didn’t give the correct answer immediately so slapping a unit test on it means I get to quickly exercise it properly now - and not waste time later should I continue to get it wrong.
What you’re describing is heading towards fuzzing or property based testing.
That makes sense, because I’m very partial to property-based testing. Exactly because it finds data combinations for you, vs. relying on you to come up with an endless amount of data states, and not knowing which are important and which are not.
Unit test “purity” discussions are the biggest waste of time in the known universe. Less I/O and side effects is better, because it makes the tests faster and less flaky, but consistently repeatable I/O is absolutely fine. The goal is to not have to do shit manually, period.
Anyone engaging in unit test purity discussions needs to be shushed immediately. Any teams I’m part of, I shut it down, kindly, if possible, but it must go away, or the bike shedding never ends.
I mostly agree with the other top comment. OP seems to be going after a very specific discourse about testing, which is kinda dumb and misplaced, and a straw man of it, on top. I’ll discuss how to write useful tests anytime, but this ain’t that discussion.
Feels like a straw man.
foo/12/04.1
, there’s just no match.start
andstop
are relevant, and for “requesting a file”filename
is also relevant. But with a reasonable schema you’d insert maybe two entries per test (using default or even random values for any field which is irrelevant for the test), and assert that the filter retrieves the entry with the relevant ID.(a fake paragraph to get us out of list mode)
redirect
flag settrue
, the otherfalse
.[1] At my former job, I recall spending two full days (8 hours each) in a room full of engineers trying to eek out the minimum set of tests for a project we were working on (and then three work weeks writing out the “test plan” that once written, NO ONE ever referred to again, just because that’s what was expected). So what is it? Minimal tests? Maximum tests? 100% code coverage?
Sorry if it feels like I’m yelling at you, but I’m trying to figure out what the hell unit tests are, and now you’re telling me I did too many, and I didn’t use a proper framework.
Testing is the hardest thing I’ve ever learned - I’m very much still learning after 19 years as a programmer. I didn’t learn to appreciate TDD before actually working with someone who had been part of an extremely successful team at an earlier job, developing a somewhat famous system. So I wouldn’t be too frustrated about finding it confusing. Learning it on your own is a bit like learning woodturning by building your own lathe.
That said, the actual definition of a unit is one of the least interesting things about testing. It’s only interesting insofar as it allows you to have confidence in that the code does what it should be doing. That confidence then allows fearless refactoring, which means that every time you learn some way to improve any part of the code you can apply it everywhere without worrying about breaking anything. Some field is no longer needed? Simply snip it everywhere in your production code, run the tests, verify that the tests that should be dealing with that field fail (otherwise the tests are probably misnamed, overtesting, or defective), verify that no other tests fail (ditto), then update the tests. Similar with adding features - if any existing tests fail, make sure to understand why before continuing. Unfortunately it would take many weeks of blogging to try to explain this in detail, this post is already huge, and others have probably explained it better already.
Spending a bunch of time ahead of implementation trying to tease out the tests is just waterfall in action, and a terrible idea. While developing you’re bound to come up with a bunch more tests, and you’ll probably find that a bunch of the tests which you came up with ahead of time are redundant. That is, in TDD terms, adding those tests makes nothing fail in the implementation so far.
On a related note, if in doubt whether you need another test for a piece of code, try mutation testing. Nobody had told me about this side effect of it, but when you run mutation tests you’re basically checking the completeness of your test suite. If any of the mutations survive you might have a hole in your tests. But conversely, if more than one test kill a mutation then one of those tests might be redundant (or testing too much).
And yes, programmers (myself included) definitely aren’t capable of writing bug-free code. Tests (unit, integration, acceptance, performance, load, mutation, and so on) are a huge field, but a small part of the toolkit you can use to reduce the chance and severity of bugs.
You say you appreciate TDD, which I know as “Test Driven Design,” that is, you write your tests first to drive the design of the code. Then you say:
So is my understanding of TDD incorrect? Or did you contradict yourself?
And I’ve read scores of blog posts that state otherwise, hence my confusion here.
You post isn’t huge, and no, others haven’t explained it better, else I wouldn’t be bitching about it.
So is compilation a form of testing? Because some have said so, yet I don’t agree with that—if you can’t compile the code, you can’t compile the tests (because that’s the language I work in—C), so it’s tough to run the tests.
The crucial error here is “tests”. TDD is one test at a time, each test moving the code towards some goal, each commit including some code change and (if necessary) the relevant, usually single, test which demonstrates that the code does something useful and new.
I can’t answer for what others find interesting, but I consider the actual test techniques vastly more interesting than the definition.
This is where the discussion often gets into what I consider boring definitions. The fact that your code compiles is obviously a useful “test” in that it increases (some would say is required for) confidence that the code will work in production. It’s also a “test” in the sense that you could at least in principle write other tests which overlap with the work the compiler is doing for you. :shrug: :)
First, to op: I think you’re doing great, and on the “right” track.
Fwiw I’ve written rather similar “let a hundred sledgehammers hammer” [m] type tests when I had the great fortune to receive a project without any tests - as a first step to be able to refactor/fix bugs without introducing more than one bug per bug i fixed - and on occasion when working with a particularly narly and under-documented API (your custom SOAP API running on php is sensitive to element order, and wants it’s arguments in xml in a data-element, inside xml? OK, let me just hammer out what dark spells I want my code to output (hello, manual post via curl) - and record what horrors I receive back - record that in a few test cases and force my code to summon conforming monsters from the database).
The main thing is to write tests that give some value to your project. I often thought “simpler” tests didn’t (“clearly this route always return 200 ok!” Well, not if some middleware that was added five years ago by someone else is suddenly missing a dependency because we’re running on a supported version of Debian now, not some t’en year old rubbish..).
But the more importantly:
Yes, no they didn’t. Strict test driven design is quite simple:
You have a new feature (list blog posts)
Write the smallest test that fails (“red” test)
In this case, with a green field app - that might be: GET /posts and expect to receive a json object containing two posts
Eg: ‘return { “post”: {}, “post”: {}}’
Refactor. Keep test green - but eg get post data from test database / test fixtures.
Goto 0
In this case, maybe add filtering on year - expect 2 posts for 2023, 1 for 2021, none for 2020.
Now, the code to return a static dummy post array might seem absurdly trivial - and it would be absurd to start there when you already have production code.
But notice that if you start with TDD, all your code will be amendable to test, and you will build up an always up-to-date set of test data/fixtures. When you add an “updated at”-field to your post, it goes into a test first.
As for the unit vs integration test - TDD by itself doesn’t really “give” you integration tests as such. But that doesn’t mean you don’t need them! Unit tests (try to) certify that the parts do what it says on the tin (potato soup, starter). Integration tests (try to) certify that the sum of your tested parts combine to do what it says on the box (full meal inside).
At one extreme TDD will give you a lot of throwaway tests - but leave you with a few useful ones, and with code that is modular, easy to test, easy to instrument. At another TDD can leave you with 100% test coverage and a solid set of test data.
Unit tests can combine with integration tests, when you encounter new data “in the wild” that breaks something. Maybe you end up adding a test for your input validation - catching another edgecase - but first you add the invalid data and watch how listing posts break when some field is NULL or half a valid utf8 or something.
Phew. Anyway, hope that might be of some value..
[m] to misquote Mao
If your understanding of TDD is “writing all tests before hand”, it is so incorrect It is not even funny
how incorrect would it need to be to be funny?
Nope.
Big nope. You need just enough tests to force you to write the production code.
One test is usually not enough, because then the “production code” can be “just return the hard-coded result the test wants”. And in fact doing that (hard-coding the response) is actually a good idea. Yes, sounds strange, but is true. Why? Because it counteracts the super strong tendency of programmers to overgeneralise. “I know how to do this!!”
With a second test case, just returning a hard-coded result no longer works. In order to make it work again, you’d either have to start testing the parameter values (if/switch) in order to figure out which hard-coded value to return. Or you can implement the actual computation.
Probably a fair point here, but how do you know what the exact minimal data is that exposes a bug in the general case? That’s not possible, right? So we account for that by adding more test cases that might uncover an unknown unknown of ours. We don’t optimize test cases down to a minimum covering set, because we don’t know what that set is. We try and make test suites more robust by adding “multiple coats” of different data combinations.
Consider that there is always a trade-off. Every single test you write comes with one-off and repeated costs (writing it, maintaining it, running it), meaning you really want to know if a test is not adding any value. So it’s worth spending a bit of time making sure every test is adding actual value.
I agree with that. For me, that’s why I focus on generative testing, and supplement it with targeted test cases that I know are really important. I think we undersell how hard it is to come up with good test cases.
Exactly. Fortunately, it is also not usually necessary. Both testing and static typing are theoretically ludicrously inadequate. However, both practically do a fairly good job. Why? Because most bugs are stupid. They are so stupid that we can’t see them (because we are looking for sophisticated errors), and so stupid that exposing them to even a minimal amount of reality will expose them.
There was a paper a while ago that talked about how even extremely low levels of code coverage were adequate for uncovering most bugs, and raising the level beyond didn’t help much.
This paper?
Unfortunately I take a lot of these studies with a huge grain of salt. The linked paper here tests databases for example. It’s very rare for studies like this to be done on business applications, which have a lot more functionality.
I agree anecdotally that “simple” testing gets you quite far. Probably 80% or so. The issue is that correctness is binary - if one piece of functionality has a bug, users notice it and can’t get their job done.
I also thing a large chunk of bugs are simply a failure to express requirements correctly, I.e. are misspecifications. This is something that testing doesn’t always directly help with.
What you’re describing is heading towards fuzzing or property based testing.
It’s not usually possible to be sure that we have covered all possible inputs when writing tests like these by hand. Taking a wild stab at ideas that might possibly break the code is just going to leave you with a lot more tests that don’t have any logic behind their existence.
If I don’t know what the minimum set of tests to cover a function is, I decompose the function and write tests that cover the bounds of inputs to its simpler components first. I then use simple substitutes for its components so that I can write simple tests for the original function.
I say this, but I very rarely write unit tests these days. Usually only when I’m writing some very tricky code, or something ‘simple’ I wrote didn’t give the correct answer immediately so slapping a unit test on it means I get to quickly exercise it properly now - and not waste time later should I continue to get it wrong.
That makes sense, because I’m very partial to property-based testing. Exactly because it finds data combinations for you, vs. relying on you to come up with an endless amount of data states, and not knowing which are important and which are not.