This was pretty much my only con with deterministic simulation testing: it’s up to you as the simulation writer to come up with fault scenarios that model the real world in any component that’s stubbed out for determinism. This isn’t a huge con, but it has been something that’s crossed my mind.
True integration tests are one way around that problem, by exercising the real component. It can be very difficult to create failure modes in real components though (as mentioned here, via killing / pausing processes, etc.) For example, fsync gate in postgres wasn’t detected for years and years, even in thousands of systems running in production. In part, this is because it because it was a result of hardware errors, which are nigh impossible to reproduce.
Oh, I didn’t know about the Linux stuff. Cool, thanks.
This has my wheels spinning…
In what direction? I haven’t thought this through, but I wonder if we have an interface (of the filesystem, say) we implement that interface twice, one real implementation (that uses the POSIX open, write, read, fsync etc) and a fake one (that uses some in-memory representation of the filesystem). Can we somehow test the real and the fake interfaces against each other? The happy path is easy, but what happens if we start injecting faults to the real implementation? We would then be forced to extend the fake in a way such that it responds to fault injection as well, that’s compatible with the real implementation. Once we’ve accounted for all faults in the fake, we can use the fake with faults as basis for simulation testing a program which depends on the filesystem.
Would this approach catch fsync gate? I don’t think so, but I believe it could still be useful.
I linked this above, but this isn’t talked about until the end of the post so here it is again: https://concerningquality.com/prophecy-variables/. If you look at the “Prophecy-Aware Dependencies…” section, this implements the exact approach you’re talking about here, minus the comparison of the fake to the real implementation.
What I was thinking is, if we have fault injection built into the dependency in a deterministic way, then in theory we don’t even need a fake because we can control even the most rare error cases. Building the fake here was suggested because it’s more likely that you have no control over nondeterminism in any existing dependency today, so in the mean time you can wrap it with one of these modal fakes. And furthermore, errors are just one special case of nondeterminism. I’d like to do something like control the order of thread completions so concurrency could be controlled in tests.
Basically, with enough of these in place, a real dependency could be used in a simulation test, because its determinism could be controlled.
There’s some overlap in the approach around how the generator, model, and properties are formulated. But it’s more of a style, like using a simple model or “smoke test” invariants, and leaning on how even complicated bugs might show themselves through more obvious symptoms. There’s no reuse of Quickstrom or QuickLTL, though. The liveness test works quite differently, it’s much simpler and domain specific.
This was pretty much my only con with deterministic simulation testing: it’s up to you as the simulation writer to come up with fault scenarios that model the real world in any component that’s stubbed out for determinism. This isn’t a huge con, but it has been something that’s crossed my mind.
True integration tests are one way around that problem, by exercising the real component. It can be very difficult to create failure modes in real components though (as mentioned here, via killing / pausing processes, etc.) For example, fsync gate in postgres wasn’t detected for years and years, even in thousands of systems running in production. In part, this is because it because it was a result of hardware errors, which are nigh impossible to reproduce.
Still, the fault injection techniques described here are worthwhile since they will certainly uncover some bugs. My “pie in the sky” thinking is that we should design infrastructure components (down and including to the OS) to support prophecy variables, which would allow for triggering failure scenarios in generative tests by generating values for these variables.
Fun to think about, but this obviously doesn’t help with testing on existing OS’s.
Please keep up the writing and sharing of this awesome stuff, TigerBeetle team.
You can do some fault-injection in userland using libfiu, which uses the LD_PRELOAD trick, but I agree that it doesn’t solve all problems.
I had never heard of
libfiu, thanks for sharing. It’s an interesting design space.A few other related things:
Apparently Linux has some fault injection mechanisms as well: https://docs.kernel.org/fault-injection/fault-injection.html.
This has my wheels spinning…
Oh, I didn’t know about the Linux stuff. Cool, thanks.
In what direction? I haven’t thought this through, but I wonder if we have an interface (of the filesystem, say) we implement that interface twice, one real implementation (that uses the POSIX open, write, read, fsync etc) and a fake one (that uses some in-memory representation of the filesystem). Can we somehow test the real and the fake interfaces against each other? The happy path is easy, but what happens if we start injecting faults to the real implementation? We would then be forced to extend the fake in a way such that it responds to fault injection as well, that’s compatible with the real implementation. Once we’ve accounted for all faults in the fake, we can use the fake with faults as basis for simulation testing a program which depends on the filesystem.
Would this approach catch fsync gate? I don’t think so, but I believe it could still be useful.
I linked this above, but this isn’t talked about until the end of the post so here it is again: https://concerningquality.com/prophecy-variables/. If you look at the “Prophecy-Aware Dependencies…” section, this implements the exact approach you’re talking about here, minus the comparison of the fake to the real implementation.
What I was thinking is, if we have fault injection built into the dependency in a deterministic way, then in theory we don’t even need a fake because we can control even the most rare error cases. Building the fake here was suggested because it’s more likely that you have no control over nondeterminism in any existing dependency today, so in the mean time you can wrap it with one of these modal fakes. And furthermore, errors are just one special case of nondeterminism. I’d like to do something like control the order of thread completions so concurrency could be controlled in tests.
Basically, with enough of these in place, a real dependency could be used in a simulation test, because its determinism could be controlled.
Gotta ask, how much of this is an evolution of Quickstrom
There’s some overlap in the approach around how the generator, model, and properties are formulated. But it’s more of a style, like using a simple model or “smoke test” invariants, and leaning on how even complicated bugs might show themselves through more obvious symptoms. There’s no reuse of Quickstrom or QuickLTL, though. The liveness test works quite differently, it’s much simpler and domain specific.
Is this like a property based testing approach on the real system?
The post says generative so i assume the workloads are built up dynamically which sounds pretty sick
Yep that’s what it sounds like. Take a look at Jepsen too: https://jepsen.io/.
Which has applied this to many popular databases to check for correctness guarantees.