The author has an interesting point but deliberately biases the whole discussion by defining type systems such as Hindley-Milner as “human friendly” and traditional ones as “compiler friendly”, then circularly arguing that the ones he defines as human friendly are more friendly to humans.
If you’re going to employ biased language to make an argument you could just as easily describe Hindley-Milner as “mathematician friendly” and C-style ones as “programmer friendly”. But that would get us nowhere.
What I’d love to see is an empirical study of the effect that type systems have on programmer productivity, software reliability and code maintenance. Does anyone know if there are any studies out there like that?
Yes, it turns out there are a lot of studies on this. After hearing some particularly strong claims about this sort of thing recently, I did a literature review and wrote up notes on each study, just for my own benefit.
The notes are way too long to post as a comment here (the notes are over 6000 words, i.e., longer than the average Steve Yegge blog post), and they’re actually pretty boring. Let me write a short summary. If anyone’s interested in the full results, I can clean them up and turn them into a blog post.
1. Most older controlled studies have a problem in their experimental methodology. Also, they often cover things most people probably don’t care about much anymore, e.g., is ANSI C safer than K&R C?
2. Some newer controlled studies are ok, but because of their setup (they compare exactly one thing to exactly one other thing) and the general lack of similar but not identical studies, it’s hard to tell how generalizable the results are. Also, the effect size these studies find is almost always very small, much smaller than the difference in productivity between programmers (who are usually selected from a population that you’d expect to have substantially smaller variance in productivity than the population of all programmers).
3. People rarely study what PL enthusiasts would consider “modern” languages, like Scala or Haskell, let alone languages like Idris. I don’t like that, but it’s understandable, since it’s probably easier to get funding to study languages that more people use.
4. There are a number of studies that mine data from open source repositories. They come up with some correlations, but none of them do anything to try to determine causation (e.g., instrumental variable analysis). Many of the correlations are quite odd, and suggest there are interesting confounding variables to discover, but people haven’t really dug into those.
For example, one study found the following in terms of safety/reliability of languages: (Perl, Ruby) > (Erlang, Java) > (Python, PHP, C), where > indicates more reliable and membership in tuple indicates approximately as reliable. Those weren’t the only languages studied; those particular results just jumped out at me. The authors of the study then proceeded to aggregate data across languages to determine the reliability of statically typed v. dynamically typed languages, weakly typed v. strongly typed, and so on and so forth.
That analysis would be interesting if the underlying data was sound, but it doesn’t pass the sniff test for me. Why are languages that are relatively similar (Perl/Ruby/Python) on opposite ends of the scale, whereas some languages that are very different end up with similar results? The answer to that is probably a sociological question that the authors of the study didn’t seem interested in.
This seems to echo the results of the controlled studies, in that the effect a language has on reliability must be relatively small for it to get swamped by whatever confounding variables exist.
5. There are lots of case studies, but they’re naturally uncontrolled and tend to be qualitative. Considering how little the other studies say, the case studies make for some of the most interesting reading, but they’re certainly not going to settle the debate.
Anecdotally, SML’s type system is not particularly friendly to programmers (it’ll beat you over the head over and over again until you satisfy its exacting requirements) but extremely supportive of correctness; Java’s type system isn’t especially friendly to anyone due to its inflexibility, inexpressiveness, and extraordinary amount of required boilerplate; and Python’s type system (or lack thereof) is incredibly friendly to programmers (“sure, you can do that! Why not! Treat this int as a dict, I don’t care!”) but allows all sorts of bugs ranging from the blatantly obvious to the deviously subtle to make it into running code.
I’ve seen a couple of studies of programmer productivity in various languages/environments (not, IIRC, looking specifically at type systems), but none well-founded or conclusive enough for me to remember or recommend them.
Here’s some that I’ve found: