1. 23
  1.  

  2. 6

    I like how concise and unopinionated this article is (or rather that the opinions are clearly marked and a good attempt made to separate the information from it).

    I’m pretty bad with tests, essentially doing only acceptance, diff and a lot of manual tests (for what could be automanual). I have yet to find tests that aren’t end-to-end and can survive major refactors (that are complete rewrites or close to it) or even language changes. I’d also like them to not discourage me from changing a bad API (that’s likely to invalidate a lot of unit tests).

    And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too. Even creating a second file in a project meant to be small feels cumbersome.

    Before reading this, I used to think property tests meant fuzzing and now learned that fuzz tests don’t actually check the output!

    1. 6

      And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too. Even creating a second file in a project meant to be small feels cumbersome.

      Ooh this is totally a thing- obscure, but a thing. I’ll find some references and add a section on it when I get a chance.

      1. 5

        “and now learned that fuzz tests don’t actually check the output!”

        It’s one of those terms that don’t have a precise definition. Depends on which researchers or tool developers you’re talking to. What they have in common is throwing random data at something watching for an effect. That used to not involve a check because the problems they looked for often caused crashes. If the problems don’t, you might need checks to spot them. So, fuzzing != no checks but it’s common to not have them if target is C language or something.

        Lots of random input is the defining trait.

        1. 6

          There’s definitely a point where if you’re using a fuzzer, but adding lots of assertions about the output being reasonable, it starts to feel more like property-based testing.

          I did an internal tech talk about property-based testing at work couple months ago, and made the case that PBT tooling belongs in one’s toolbox because it can be applied to a range of test styles: along with typical PBT “tactics” (to use hwayne’s term), the test library can also run properties that are closer to fuzzing or model-checking. New tactics and a variety of approaches can plug into existing PBT tooling.

          The core interface* is arguably specifying how to generate input for a property (reproducibly), and how to classify that input as interesting or not (by running some test code and checking for failures/errors). This framing is pretty general, and mainly ensures the PBT library has the info it needs to shrink interesting input. Plugging different testing tactics can lead to tests with very different trade-offs:

          • “random input” + “it doesn’t crash”: Classic fuzzing.
          • “structured input” + “these two implementations agree”: Comparing the code under test against a reference implementation, a naive/inefficient version, or the same codebase without an experimental optimization. Taking something easier to verify, and using it as a foothold to check something more complex.
          • “structured input” + “resource limits are sufficient”: Searching for input that uses disproportionately large amounts of memory, CPU time, or whatever.
          • “structured input” + “roundtrip a conversion”: encode some data, decode it (say, pack and unpack data for serialization to disk), and check if anything got lost in translation. A classic tactic.
          • “a sequence of operations against an API” + “call API and check results, update in-memory model”: This works more like a model checker, and an in-memory dictionary can be an easy stand-in for a database, filesystem, or other model of the state of the outside world when testing complex logic in a vacuum. While it isn’t exhaustive, the way (say) TLA+ can be, and it’s difficult to apply to problems that are inherently nondeterministic due to concurrency, it’s a great way to cheaply stress-test logic and discover surprising interactions. There are lots of APIs where the individual operations make sense, but have subtle misalignments between them that can compound and spiral out of control when combined in certain ways. As a bonus, shrunken failing input for these properties tends to have a narrative: “when you do this, then this, then this, then this, it fails in this way”.

          Controlling the input generation also means that it can be steered towards particular areas of the state space, in a way that feels more direct than (say) temporarily using asserts to convince afl-fuzz that those branches are boring.

          * I have the most experience using theft on C (I’m the author), but it seems like this applies to most other PBT libraries.

          1. 1

            I see. Thanks for the clarification!

          2. 4

            And while I’m wishing, something that can go from manual (done from a REPL or debugger) to automanual more naturally would be nice too.

            I might have misunderstood you, but something like python’s doctests could be what you’re looking for. You can basically use the REPL to manually test your code and how to use it, you paste that in the docstrings, and doctests runs it again and makes sure it behaves like it should.

            1. 1

              something like python’s doctests could be what you’re looking for

              Good suggestions. I quite like the idea of doctests but here are some things that dissuaded from adopting them, at least right away.

              • I usually don’t want to see my tests when editing code. I guess I want tests either in a separate file (which also goes against my wish to have fewer files) or maybe grouped together at the end (or beginning) of the file.
              • Only the very first few (manual) tests tend to focus on a single function.
              • In a REPL or debugger, I usually have a lot of state set up. In the debugger (say when post-mortem debugging), I don’t have the steps to recreate this state! Of course, I could spend time to cook up an example but that takes more thinking than just a copy-paste (and I’m trying to find something that lets me progressively more effort for more certainty).

              Right now, I know that a lot of my REPL commands are just lost to the history file and could probably be made automatic instead.