1. 10
  1.  

  2. 13

    There’s two really nice things here:

    1. The “number of versions” is a fantastic metric and Microsoft Research observed something similar spending a little more time on this point. If you’re changing a module many times, perhaps you (the programmer) don’t know what it is supposed to do?
    2. The “size” is another good metric, but I think the authors don’t go far enough: Lines of code and number of statements are nowhere near as good as “source code bytes”. Arthur is supposed to have said only a short program has any chance of being correct, but really it’s scrolling that is getting you into trouble: When data is produced and consumed out of view of each other you literally cannot see the opportunity for the bug.

    But maybe something not so nice: testing has a negative correlation with defects, but not very much. This is consistent with a lot of other empirical examinations on the subject that had a lot less data, but it still sounds bonkers. People who swear by test driven development know it’s helping them write better code, but the numbers don’t lie, so what is it?

    My theory is that when you tell someone they need to write tests, they write crappy tests, but if they want to write tests even when they’re not required, then it’s because they want a second way to look at their problem and understand it better. And that’s what we’re striving for.

    1. 4

      when you tell someone they need to write tests, they write crappy tests, but if they want to write tests even when they’re not required, then it’s because they want a second way to look at their problem and understand it better.

      This rings very true to me, for what it’s worth.

      1. 2

        Yes, I wrote something similar on Stack Exchange a while back:

        Testing follows a common pattern in software engineering: testing is claimed to make software better/more “agile”/less buggy/etc., but it’s not really the testing which does this. Rather, good developers make software better/more “agile”/less buggy/etc. and testing is something that good developers tend to do.

        In other words, performing some ritual like unit testing for its own sake will not make your code better. Yet understanding why many people do unit testing will make you a better developer, and being a better developer will make your code better, whether it has unit tests or not.

      2. 3

        If you’re changing a module many times, perhaps you (the programmer) don’t know what it is supposed to do?

        That or the people asking for the change don’t know what they’re doing and keep changing the requirements. >.<

        1. 1

          That can generate new modules rather than changes to existing ones.

      3. 4

        I’m unfamiliar with this literature, but a -0.25 correlation doesn’t seem as bad as they make it sound, what am I missing? I mean, clearly there’s more going on here, but I don’t understand why they refer to this as a “slight” tendency.

        Which brings me to my next question, what’s up with all the pairwise correlations? I would have expected a regression model of some sort that could account for several independent variables at once. Maybe I don’t entirely understand their data set?

        Finally, I would have expected to see p-values reported. I realize these can be controversial, but the authors appear to have reported significance (the bold values) without explaining how they measured it.

        Anyway, if anyone can explain any of this to me, I’d appreciate it :-)