There is also an even more comprehensive Erlang-focused testing book being developed under http://testingonthebeam.com/ The TOC is incredibly enticing.
I very frequently do the sort of testing described here as “FSM validation” in C (using theft). Even if I didn’t use property-based testing for anything else, the results I’ve gotten from that approach alone would more than justify the time I’ve spent learning property-based testing tools.
Appreciate the feedback since I want more corroboration about that. I know worked well with formal specs but people will want to see it on code-only, too.
@ferd seemed to think it was only Erlang technique. These things used to be called “specification” or “model” based testing rather than property. The use of state-based modeling with pre- or post-conditions in formal specifications was also pretty much default due to FSM’s and Diskstra’s methods being popular. People had done randomized testing of them many times but most research tried to avoid it to reduce execution time due to hardware costs or thinking they could outdo random. Here’s an example from 1997 with FSM modeling plus heuristic and random testing. One with Statecharts. Another one from 2009 with UML. Recent results with “property”-based testing create a bandwagon effect where lots of these tools are being built, including for old languages. QuickCheck style being most common.
The tooling never went mainstream, though. That’s likely because it usually came with formal specs that were hard for people to learn. There were good reasons for that for sure. However, the newer ones are just giving a purpose-built, lightweight spec that only generates tests. The simplicity and working within the language itself is probably driving a lot of adoption. The methods are old, though. One more benefit while we’re at it is that model-checkers and provers can work with FSM models better than most. So, one can get property-based testing now plus some stronger stuff later if they or a contributor has the resources.
It doesn’t feel like a fundamentally new technique, but most of the
writing I’ve seen that explicitly describes it building on
property-based testing tooling has been Erlang-flavored. The commercial
Erlang QuickCheck’s documentation suggests testing stateful systems that way, and the library has some
built-in support, so it’s probably more familiar within the Erlang world.
(IIRC, they call it “statem” testing, for “state machine”.)
Hypothesis also has API support for stateful
testing now,
and its docs refer to similar support in ScalaCheck.
theft doesn’t have any explicit API hooks for this use case yet,
mostly because I haven’t figured out the right abstractions for a C API, but it
definitely supports it. Typically, I have theft generate an array of
structs, each of which describes an operation in the API I’m testing –
if I were testing a hash table, it would generate things like:
and then a simple switch-case loop calls the code under test with each operation, checks that the
result makes sense according to a simple in-memory model (typically just
an array, with dead-simple linear search), and then updates the model
with new bindings. After the whole series of operations, it would compare
the end state of the model to the hash table. If
deleting a binding has caused other earlier, colliding bindings to get lost, or
growing the table has corrupted data, it would fail the test, shrink the
steps, and present a small series of steps to reproduce the bug.
I’d already been doing that before I encounterted Erlang QuickCheck – I was
trying to use PBT like a lightweight model checker, but one that would
catch implementation bugs (because, hey, I mostly work in C.). I learned
a bit about model checkers from dabbling in
Alloy after Guillaume Marceau’s presentation
at !!Con 2014. Describing the
approach as a lightweight, probabalistic model checker still makes more
sense to me than “state machine testing”, because I might not be testing
a state machine at all – I’ve used it for cache invalidation logic,
filesystem code, stress-testing a garbage collector, fuzzing a compiler,
and several other things.
I suspect this technique is so effective because there are lots of codebases where
individual pieces have been pretty well tested, but combining them in
unusual ways uncovers slight misalignments, and randomly chaining these pieces
finds combinations that compound the misalignments and trigger surprising failures.
Special note: @ferd is also writing a book on Property Based Testing, called… PropEr Testing – unsurprisingly, it has an Erlang spin.
There is also an even more comprehensive Erlang-focused testing book being developed under http://testingonthebeam.com/ The TOC is incredibly enticing.
I very frequently do the sort of testing described here as “FSM validation” in C (using theft). Even if I didn’t use property-based testing for anything else, the results I’ve gotten from that approach alone would more than justify the time I’ve spent learning property-based testing tools.
Appreciate the feedback since I want more corroboration about that. I know worked well with formal specs but people will want to see it on code-only, too.
@ferd seemed to think it was only Erlang technique. These things used to be called “specification” or “model” based testing rather than property. The use of state-based modeling with pre- or post-conditions in formal specifications was also pretty much default due to FSM’s and Diskstra’s methods being popular. People had done randomized testing of them many times but most research tried to avoid it to reduce execution time due to hardware costs or thinking they could outdo random. Here’s an example from 1997 with FSM modeling plus heuristic and random testing. One with Statecharts. Another one from 2009 with UML. Recent results with “property”-based testing create a bandwagon effect where lots of these tools are being built, including for old languages. QuickCheck style being most common.
The tooling never went mainstream, though. That’s likely because it usually came with formal specs that were hard for people to learn. There were good reasons for that for sure. However, the newer ones are just giving a purpose-built, lightweight spec that only generates tests. The simplicity and working within the language itself is probably driving a lot of adoption. The methods are old, though. One more benefit while we’re at it is that model-checkers and provers can work with FSM models better than most. So, one can get property-based testing now plus some stronger stuff later if they or a contributor has the resources.
It doesn’t feel like a fundamentally new technique, but most of the writing I’ve seen that explicitly describes it building on property-based testing tooling has been Erlang-flavored. The commercial Erlang QuickCheck’s documentation suggests testing stateful systems that way, and the library has some built-in support, so it’s probably more familiar within the Erlang world. (IIRC, they call it “statem” testing, for “state machine”.)
Hypothesis also has API support for stateful testing now, and its docs refer to similar support in ScalaCheck.
theft doesn’t have any explicit API hooks for this use case yet, mostly because I haven’t figured out the right abstractions for a C API, but it definitely supports it. Typically, I have theft generate an array of structs, each of which describes an operation in the API I’m testing – if I were testing a hash table, it would generate things like:
and then a simple switch-case loop calls the code under test with each operation, checks that the result makes sense according to a simple in-memory model (typically just an array, with dead-simple linear search), and then updates the model with new bindings. After the whole series of operations, it would compare the end state of the model to the hash table. If deleting a binding has caused other earlier, colliding bindings to get lost, or growing the table has corrupted data, it would fail the test, shrink the steps, and present a small series of steps to reproduce the bug.
I’d already been doing that before I encounterted Erlang QuickCheck – I was trying to use PBT like a lightweight model checker, but one that would catch implementation bugs (because, hey, I mostly work in C.). I learned a bit about model checkers from dabbling in Alloy after Guillaume Marceau’s presentation at !!Con 2014. Describing the approach as a lightweight, probabalistic model checker still makes more sense to me than “state machine testing”, because I might not be testing a state machine at all – I’ve used it for cache invalidation logic, filesystem code, stress-testing a garbage collector, fuzzing a compiler, and several other things.
I suspect this technique is so effective because there are lots of codebases where individual pieces have been pretty well tested, but combining them in unusual ways uncovers slight misalignments, and randomly chaining these pieces finds combinations that compound the misalignments and trigger surprising failures.