From the abstract: “In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology, and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimation (i) of the total number of species, (ii) of the additional sampling effort required to discover 10% more species, or (iii) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias—AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.”
In short: finding bugs in programs is like finding bugs in nature.
More seriously, this is not an easy read. It’s long (52 pages PDF) with lots of applied statistics throughout, but also a very exciting research topic with the potential to be quite useful to practitioners. We are lucky that a lot of the hard work has already been done in the context of biology.
Via the AFL mailing list.