1. 5
  1.  

  2. 3

    I like the Head Start example because it demonstrates how a qualitative, subjective, holistic analysis (“parents of Head Start children voiced strong approval of the program”) can provide better results than a data-driven approach if your data are not measuring the right thing–and in the real world, we are very, very often not measuring the right thing (something which will not come as a surprise to anyone who’s seen the third season of The Wire).

    It seems like a common belief in the software industry (particularly those parts of it that involve building tools for large-scale data analysis) that data-driven decision-making is always better than decision-making based on intuition. After all, we’ve all cases where the data reveal surprising and unintuitive things. Who would have guessed that the blue button would generate twice as many clicks as the green button? (Or whatever.)

    But it’s a mistake to over-generalize from these cases. Data can drive high-quality decisions when (a) the variables and results are both easily measured, (b) the amount of data is greater than a human mind can easily grasp, and ( c ) the act of measurement doesn’t have a distorting impact on the activity being measured.

    But too often we try to use it to make decisions about things we can’t really measure directly, with few data points and clear incentives to game the system. I’m thinking here of claims like “teams that use Agile are Y% more productive” or “code in strongly-typed languages has X% fewer defects[1].” What is the unit of measure of productivity? What is the unit of measure for a defect? You can’t measure these things directly; you can only measure intermediate targets (story points, reported bugs) and make the unfounded speculative leap that these intermediate targets will respond to inputs in the same way as what you really care about. Worse, when these measurements are used to make decisions about the allocation of work or to evaluate performance, there are incentives to manipulate the data (inflate story points, reclassify bug reports as feature requests). In the context of a small team of a dozen or fewer programmers, I have much more faith in the human brain’s pattern-matching abilities (do we feel more productive when we do X? do we feel that using Y has increased risk?) than in these very crude and fragile data analyses.

    1. Indeed Dan Luu, the author of this link, has an interesting blog post looking at the literature on type systems and productivity, and many of the studies purporting to find an effect suffer from many of these flaws of measurement.