This is a good idea in general, but I would be careful about just stopping the test once branch coverage is reached. The major benefit of property tests is that they check for more input data combinations than other testing approaches, and “correctness” means that a program works for all input data combinations.
By stopping once branch coverage is met, you miss out on additional data states that might find a bug.
I imagine in a lot of cases, this would result in more tests being run, not less. If you have a branch which only triggers for 1 value in a huge input space and you run 100 property tests by default, you’re not going to hit that branch most of the time. For example, a lot of property tests rely on many runs over time to eventually find failures.
I have a big pile of thoughts on this, maybe too much to fit into a lobsters comment.
This testing strategy is fine for what you get from property testing. Dan Luu’s excellent post on testing has a quote:
The QuickCheck paper mentions that it uses random testing because it’s nearly as good as partition testing and much easier to implement.
QuickCheck by itself is a sweet spot on the axes of “easy to implement” and “finds bugs”. The first version of QuickCheck was tiny, maybe two pages of Haskell?
Hooking that into code coverage is another sweet spot on those axes, my entire implementation supporting three property test frameworks is 76 lines of code.
Property tests are ‘spot checks’ where you might get a lucky hit on an input that finds a counterexample.
Another sweet spot I’ve considered is making sure that new generated inputs do not duplicate previously used inputs for this run, but I haven’t gotten there yet.
If you want to cover hard to reach parts of the code, I have a heavy duty solution in trynocular. In trynocular, the generated random input is wrapped in a “spy” that watches what parts are used, so you can discover how to increase code coverage. Describing that will be a much longer blog post. (But golly it’s gonna be fun)
… a sweet spot on the axes of “easy to implement” and “finds bugs”
I can definitely get behind this, because it’s an economical argument - running property tests for longer may find additional bugs, but the ROI is the number of bugs found per time period invested. And I think everyone agrees that there’s diminishing returns for longer time periods of random testing.
I still prefer to separate these “spot checks” from deep correctness checks, because they serve different and conflicting purposes. And there’s one additional cost I’d like to throw into the mix: the scaling cost, meaning the cost to scale the same test to a higher level of correctness, which here basically means running the test for longer. The scaling cost of property tests is pretty much zero: just run the test for longer. Property tests are also embarrassingly parallelizeable, since two tests processes require no coordination and we can just run more of them in parallel.
So my ideal strategy is to use “spot checks” before merge / release, and jack up the test durations after release to still try and find bugs before a user does.
If you want to cover hard to reach parts of the code….
Your trynocular tool looks really cool. I keep saying that we should really separate property testing from random data generation, since randomness is just one strategy. The idea of more targeted data generation strategies is a really interesting research area, and I support any ideas down that path. After all, the holy grail of testing is to find the minimum number of test cases that imply correctness.
I imagine in a lot of cases, this would result in more tests being run, not less
That’s right, but the set of values that you need to check for correctness is a superset of the values you need to check for branch coverage, so you need more tests to get there.
If you have a branch which only triggers for 1 value in a huge input space…
This is also my point. If we do what’s proposed here, the 1 value won’t get hit very often, so the branch also won’t get hit, so we’ll stop the test before the branch can be taken.
It’s still good to know what the branch coverage is during a property-based test, because more branch coverage is certainly better. I just would’t use it to stop looking for inputs.
I wrote a longer reply further down in this thread, but in short, I see property tests as spot checks. What you describe sounds more heavy duty. Do you have a way to measure data combinations? Perhaps trynocular does what you want?
I like to implement a feature (talking web apps) in a new project using the same stack to see how the implementation goes in a clean slate and then throw away the code.
I also delete code in my own studies. Most things I study are 10-20 lines long, say a small formal specification or a type.
For most tests you should not use property tests anyways. Instead, reshape your datastructures and functions to work on the parts of data that they actually need. If you call a function with some data structure and part of the datastructure is irrelevant for the output, then instead of using property based testing, better try to feed only a subset of the data to the function.
I admit that many languages make that much harder than necessary. Typescript and it’s Pick-utility show how this can be done ergonomically.
If you have a function where you actually use all parts of the data and the parts interacts with each others during the calculation, then by all means use property tests - but don’t stop when coverage is reached.
What you’re saying is orthogonal to whether or not to use property tests though. It’s always good to have functions that operate on the smallest set of input data possible. But you can use that both with and without property tests, and simply doing this doesn’t somehow lead to a perfect scenario-based test suite.
Property tests are about getting confidence about invariants of the code. They are simply different than scenarios, which may show invariants indirectly, but can’t explicitly define what those invariants are. I think having the invariants communicated and checked for directly is very valuable.
That’s right. What I mean to say was: don’t use property tests as a “fix” for not using the smallest possible set of input data. And I feel if you combine property based tests with coverage, then that’s what you are using it for.
This is a good idea in general, but I would be careful about just stopping the test once branch coverage is reached. The major benefit of property tests is that they check for more input data combinations than other testing approaches, and “correctness” means that a program works for all input data combinations.
By stopping once branch coverage is met, you miss out on additional data states that might find a bug.
I imagine in a lot of cases, this would result in more tests being run, not less. If you have a branch which only triggers for 1 value in a huge input space and you run 100 property tests by default, you’re not going to hit that branch most of the time. For example, a lot of property tests rely on many runs over time to eventually find failures.
Is this the case @shapr?
I have a big pile of thoughts on this, maybe too much to fit into a lobsters comment.
This testing strategy is fine for what you get from property testing. Dan Luu’s excellent post on testing has a quote:
QuickCheck by itself is a sweet spot on the axes of “easy to implement” and “finds bugs”. The first version of QuickCheck was tiny, maybe two pages of Haskell?
Hooking that into code coverage is another sweet spot on those axes, my entire implementation supporting three property test frameworks is 76 lines of code.
Property tests are ‘spot checks’ where you might get a lucky hit on an input that finds a counterexample.
Another sweet spot I’ve considered is making sure that new generated inputs do not duplicate previously used inputs for this run, but I haven’t gotten there yet.
If you want to cover hard to reach parts of the code, I have a heavy duty solution in trynocular. In trynocular, the generated random input is wrapped in a “spy” that watches what parts are used, so you can discover how to increase code coverage. Describing that will be a much longer blog post. (But golly it’s gonna be fun)
I can definitely get behind this, because it’s an economical argument - running property tests for longer may find additional bugs, but the ROI is the number of bugs found per time period invested. And I think everyone agrees that there’s diminishing returns for longer time periods of random testing.
I still prefer to separate these “spot checks” from deep correctness checks, because they serve different and conflicting purposes. And there’s one additional cost I’d like to throw into the mix: the scaling cost, meaning the cost to scale the same test to a higher level of correctness, which here basically means running the test for longer. The scaling cost of property tests is pretty much zero: just run the test for longer. Property tests are also embarrassingly parallelizeable, since two tests processes require no coordination and we can just run more of them in parallel.
So my ideal strategy is to use “spot checks” before merge / release, and jack up the test durations after release to still try and find bugs before a user does.
Your trynocular tool looks really cool. I keep saying that we should really separate property testing from random data generation, since randomness is just one strategy. The idea of more targeted data generation strategies is a really interesting research area, and I support any ideas down that path. After all, the holy grail of testing is to find the minimum number of test cases that imply correctness.
That’s right, but the set of values that you need to check for correctness is a superset of the values you need to check for branch coverage, so you need more tests to get there.
This is also my point. If we do what’s proposed here, the 1 value won’t get hit very often, so the branch also won’t get hit, so we’ll stop the test before the branch can be taken.
It’s still good to know what the branch coverage is during a property-based test, because more branch coverage is certainly better. I just would’t use it to stop looking for inputs.
I wrote a longer reply further down in this thread, but in short, I see property tests as spot checks. What you describe sounds more heavy duty. Do you have a way to measure data combinations? Perhaps trynocular does what you want?
I believe another technique to achieve this is described in the Find More Bugs with QuickCheck! paper by Hughes et al (2016).
so yeah, this is a great paper, thank you for linking it. I’m still absorbing it, but it is very much what I needed
I like to implement a feature (talking web apps) in a new project using the same stack to see how the implementation goes in a clean slate and then throw away the code.
I also delete code in my own studies. Most things I study are 10-20 lines long, say a small formal specification or a type.
I don’t think that is a good idea.
For most tests you should not use property tests anyways. Instead, reshape your datastructures and functions to work on the parts of data that they actually need. If you call a function with some data structure and part of the datastructure is irrelevant for the output, then instead of using property based testing, better try to feed only a subset of the data to the function. I admit that many languages make that much harder than necessary. Typescript and it’s Pick-utility show how this can be done ergonomically.
If you have a function where you actually use all parts of the data and the parts interacts with each others during the calculation, then by all means use property tests - but don’t stop when coverage is reached.
What you’re saying is orthogonal to whether or not to use property tests though. It’s always good to have functions that operate on the smallest set of input data possible. But you can use that both with and without property tests, and simply doing this doesn’t somehow lead to a perfect scenario-based test suite.
Property tests are about getting confidence about invariants of the code. They are simply different than scenarios, which may show invariants indirectly, but can’t explicitly define what those invariants are. I think having the invariants communicated and checked for directly is very valuable.
That’s right. What I mean to say was: don’t use property tests as a “fix” for not using the smallest possible set of input data. And I feel if you combine property based tests with coverage, then that’s what you are using it for.