ACM links tend to get paywalled, so here’s a copy from an author’s site: http://people.brunel.ac.uk/~csstmms/FucciEtAl_ESEM2016.pdf
(Also, I suggested the science tag because they’re actually doing science instead of proclaiming personal opinions.)
Maybe I’m missing something in my skimming, did you see where they defined what exactly are their TESTS, QUALITY, PROD variables were and how they were measured?
Section 3.4 defines them. TEST is straight count, QLTY is the number of user stories for which any test of the authors' passed, PROD is what total percent of the authors' acceptance suite passed.
Hmm. One way of reproducing this result would be to make the acceptance tests randomly return “true” 68% of the time. Or making the specs vague enough that people only understood 68% of them.
Assuming that wasn’t the case…
The true “surprise” conclusion of the study is emphasized below…
We conclude that no difference between TDD and
TLD could be observed in terms of testing effort, external
code quality, and developers’ productivity.
Really!? I’m truly surprised.
One thing that may have reduced the sensitivity of this experiment to such a degree as to make it worthless is hinted at in table 5.
ie. Some participants achieved just plain nothing (as represented by the min’s being 0) and some did it all (as seen by the max’s being 100%)
ie. There was such high variance between participants that the effect of TLD vs TDD is totally lost in the noise.
ie. The only thing this paper reconfirms is http://alistair.cockburn.us/Characterizing+people+as+non-linear,+first-order+components+in+software+development