1. 15
  1.  

  2. 1

    A colleague was asking me about fuzzing, perhaps I should send this to them along with the first part.

    Apropos: is there a good discussion of the differences between fuzzing and Property-Based Testing?

    1. 1

      I don’t think I’ve seen one, but in lieu of something more rigorous, here’s my take. It’s a bit of a fuzzy boundary, but property testing usually:

      • Allows building constrained, but richly typed values (maybe via a newtype)
      • Allows shrinking found examples
      • focusses on the relationionship between input and output (even if it’s it “it acts like this simplified model”)
      • Usually starts with random values from a distribution

      Wheras fuzzing:

      • Usually generates plain bytes
      • Doesn’t usually offer shrinking (but I’d be very happy to be wrong about that)
      • Is usually focussed on finding crashes of some kind
      • Sometimes starts from a set of known examples (eg: a jpeg file for an image loader).
      1. 2

        Doesn’t usually offer shrinking (but I’d be very happy to be wrong about that)

        Assuming this is the same thing I think libFuzzer does try to do that:

        -reduce_inputs Try to reduce the size of inputs while preserving their full feature sets; defaults to 1.

        https://www.llvm.org/docs/LibFuzzer.html

        1. 2

          The above maches my experience: usually that’s how the two setups look like.

          But I want to argue that these are accidental differences – property-based testing and fuzzing is the same technique, if implemented properly. Specifically, it is possible (and rather easy) to get all of:

          • coverage guided program exploration
          • by a state-of-the-art fuzzing engines (AFL or libfuzzer)
          • using highly structured data as an input (eg, the set of WASM programs which pass validation)
          • with support for shrinking

          The trick is to take raw bytes from fuzzer and use it as a finite PRNG you feed into property-style well-typed generator. This automatically gives you shrinking – by minimizing input raw bytes, you minimize the output well-typed struct.

          I don’t know of a good article describing this perspective abstractly. https://fitzgeraldnick.com/2020/08/24/writing-a-test-case-generator.html is a great concrete implementation.

          And I think the core idea of formulating property-based testing as generated structured outputs from unstructered inputs, getting universal shrinking for free was popularized (and probably invented?) by Python’s hypothesis: https://hypothesis.works/articles/compositional-shrinking/

          1. 1

            But I want to argue that these are accidental differences

            That’s fair. It’s pretty nebulous, really. I feel like the main difference is the perspective, or goals, rather than anything concrete.