I missed something reading the study, or maybe they didn’t mention it. Their organizational metric was actually a combination of metrics. Which of those metrics was the strongest predictor?
Those metrics are weighted according to a1, a2, … defined as “logistic regression predicted constants” (Section 5.3 Equation 1), but I can’t find where they defined those constants.
i believe that they actually determined them by calculating the suite of numbers (NOE, NOEE…) for a bunch of projects for which they knew the “had lots of bugs” answer. using this, they changed the constants a1, a2, … so that equation (1) differentiated the classes that they wanted
then they used this method to predict the failure of other binaries that they didn’t use in the training.
Precision measures false positives. Recall measures missed positives. (Table 3) The paper’s results are summarized in Table 4.
The biggest discovery for me is to find that code coverage (as defined 90-100% in paper) is a really poor mechanism to catch bugs which undermines the CI / 100% code coverage mentality that many new companies have.
Code churn relatively accurately predicts bugs, but in my opinion it’s an obvious correlation. The more a codebase is changed the more likely bugs are going to be introduced.
Rather than define 8 metrics and mix them into a statistical blender called “organizational structure”, I would have liked to see each individual metric measured against the rest. Then higher ordered metrics could be defined as such: