1. 16

  2. 38

    I don’t buy this analysis at all. The conclusion would be worth discussion, at the least, but the analysis itself seems weak.

    Also, it’s not a “shocking secret” that static types only suffice to catch some classes of errors, and that the sorts of powerful type systems that can check more than basic errors are found in non-mainstream languages. We know that.

    What is “bug density”? Is our denominator LoC or is it some measure of problem complexity? If we use bugs/LoC as our metric, then we’re punishing terse languages. How are we counting bugs? How do we evaluate the damage caused by bugs?

    A few things come to mind.

    • TDD is not the same thing as testing. You need to test your programs, in any language. TDD is a specific development methodology. They’re not the same thing.

    • “Testing” is only as good as the programmers you have writing the tests (and the code). This is probably obvious. It remains true even if we allow that static typing provides a (limited, but time-saving and powerful) form of testing. Static typing is useful when you know how to use it. It doesn’t, on its own, guarantee much (and it can be subverted with, say, “stringly” typed interfaces.)

    • The quality of programmers matters a lot more than the language. This is something that has become clear to me over the past 15 years. C++ is ugly, but I’d trust good programmers using it before I’d trust bad programmers using any toolset that exists today.

    1. 25

      One of the most horrible things that github brought to the world is this kind of automatic comparative analysis of code quality between different languages. Using github data for this is completely misguided:

      • in our industry we don’t have a standard description of what’s a bug, nor we have a standard taxonomy of bugs.
      • different languages may have different kind of bugs as some bugs are not possible to happen in some languages.
      • internal/proprietary repos are not published in github and this ignores non-internet languages (COBOL, assembler…) or certain kinds of software (games, microdevices, operating system drivers…)
      • we don’t know the development methodology used, so we cannot take it into account to find the root cause analysis of the bug (is it a language fault or software development fault?)

      Also, the logic of using some papers to disprove the usefulness of strong types:

      | “While these relationships are statistically significant, the effects are quite small.” [Emphasis added.]

      but then ignoring the same papers conclusions (no effect between strong typing/not typed languages) when he has interested to promote non typed languages is surprising.

      1. 13

        And let’s not forget that a lot of people on GitHub just use GitHub issues as a to-do list. Just because there’s an issue in GitHub doesn’t mean there’s an associated bug.

        1. 7

          I frequently get issues because people don’t read the README and want me to read it to them, so that’s another factor to consider.

      2. 3

        I would agree. I think, on the balance of things, the article conclusions might be correct. But the data shown wouldn’t meaningfully support that conclusion.

        An organization which is aggressive about quality will win the defect count vs an org that isn’t. Regardless of stack.

        I think that types are very important, but the culture beats the stack any day.

      3. 18

        Extrapolating “static typing is overrated” from a bolt-on JavaScript type system is ridiculous. So is quoting studies that haven’t been peer-reviewed, like the one the “bug density” image comes from. I mean, come on:

        The following are some charts that compare the number of issues labelled “bug” versus the number of repositories in GitHub for different languages.

        This sounds like the worst methodology ever. How can someone not feel embarrassed using this as evidence of their claim?

        1. 17

          Whenever these kinds of debates come up, there is one thing that I just can’t wrap my head around: do people who do unit testing (possibly TDD) in dynamically-typed languages imagine that programmers who use OCaml, Rust, or Scala write no tests? I don’t know any programmer who is a fan of static typing who would argue that tests are unnecessary.

          Static typing can prevent bugs (e.g., messing up the order of arguments of a function, passing an int instead of an int option), but mostly they are amazing at restricting the solution space. I really understood this idea when Yaron Minsky showed how you could take a type as defined in a language like Java or Python, and use the OCaml type system to statically disallow certain invalid configurations. That’s the real strength, in my opinion, of static typing. This works quite nicely with unit testing too: it assures you that some tests need not be written. In fact, some tests cannot be written, because the compiler will reject them outright. This offers some nice benefits to a project:

          • Your code need not be littered with defensive coding patterns (e.g., if (x == null) { ... }) to ensure that someone doesn’t mess up too badly in the future.
          • There are fewer tests. Tests are code, and code needs to be maintained; if the best code is no code, then the best tests are no tests.
          • A number of redundant tests (e.g., testing for null) don’t exist and the tests that are there are usually more relevant to the problem.
          1. 7

            <disclaimer> I’m predisposed to take exception with the conclusion of this article.</disclaimer> I poked around for a few minutes and found I did not like the methodology of the “bug density” chart. To me it felt like “this is the only methodology we could think of, so let’s draw some conclusions from it”.

            1. 5

              There was discussion a couple weeks ago about the article “The broken promise of static typing” and the general consensus was that the methodology used in that article did not account for so many variables that the conclusions were dubious at best. For example, some projects consider mistakes in documentation to be bugs, but clearly this has nothing to do with static or dynamic typing.

            2. 7

              His first chart is almost certainly very biased by the fact that you’re much more likely to SEE bugs in C++ than javascript where they will silently hide and mess things up.

              1. 6

                Sigh, this argument again.

                Here are the problems:

                • Static types only catch the errors they are designed to catch. Type systems with an escape hatch catch even fewer (although hopefully use of the escape hatch is widely criticized and avoided, like unsafePerformIO in Haskell).
                • The ability of a type system to express constraints is great, but not all projects will use the type system to encode all constraints. Only those constraints which get encoded into the type system can be caught by the type system. If they don’t, it’s not the fault of the type system for not catching them, but the fault of the programmer for not allowing it to.
                  • As an addendum to that: the biggest rebuttal to this point is that many type systems are hard to use, or result in tedious fiddling to get things to compile. While this may be true, and there are certainly things that can be done to make some type systems friendlier, there’s also the question of whether it’s reasonable to ignore the type system when it’s telling you you’re doing something incorrect. In my experience, the type system is right more often than the programmer.

                Essentially, it is only reasonable to evaluate the success of a type system by:

                • Whether it actually catches the errors it claims to catch
                • Whether it’s easy enough that programmers will actually use it.
                • Whether it catches enough different kinds of errors to be worth the additional complexity.
                1. 6

                  There are other benefits to type systems. Such as improving code readability and making it easier to find applicable stdlib functions. Forcing you to think hard about how you are modelling data and thinking out the constraints of certain design descisions ahead of time - things that I never did very often until I began programming in typed languages.

                  Plus depedent types will add even more value to static typing which I look forward to using.

                  1. 1

                    Yes, rereading my post, I perhaps gave the wrong impression. I am very much a fan of static typing! My issue is that people often attack static typing with silly/unreasonable arguments about it not doing things it never claimed to be able to do, or by pulling out weak static type systems (like C) and then using their weakness as an argument against static typing in general.

                    Part of this comes from an issue of language. A lot of the terms we use to talk about types (outside of academia) are fluid and may mean different things to different people.

                    1. 1

                      . Such as improving code readability

                      I disagree with this–one of the first things I’ve seen systems supporting type-inference do is implement a mechanism for removing the explicit statement of type (var in C#, C++, etc.).

                      That hurts readability; alternately, if it improves it, we see an argument for using a language like JS or Ruby.

                  2. 4

                    I think there are a number of confounding variables that make drawing the conclusion this post does quite hard. I wouldn’t be surprised if people more likely to submit a PR rather than create an issue for Ruby/Python/etc because they are a bit easier.

                    1. 3

                      How do you control for more bugs being found in projects of a particular programming language, independent of the actual rate of bugs?

                      For example, if it’s easier to find bugs in typed languages, you might see a higher apparent rate of bugs.

                      1. 3

                        I tried to assess the same thing with the same data, and came up with a completely different result. http://deliberate-software.com/safety-rank-part-2/

                        1. 2

                          There have been a few of these articles lately, that push some variant of “don’t worry so much about writing good code, focus on testing when it is branded as TDD.” Is there some reason for this trend?

                          1. 8

                            If I were to hazard a guess, it would probably be a lot of the Uncle Bob Martin viewpoint that TDD is the only way to build better software.

                            It is a dumb viewpoint that only TDD fixes things, but it has understandable roots from a certain perspective. Also before I get down voted for the last phrase, an explanation to why TDD doesn’t solve a lot on its own and can mostly be gamed to demonstrate competence (inspired from real life code):

                            def foo(x):
                                return x * x    
                            def test_foo():
                                self.assertEqual(foo("x"), "xx")

                            This is the most useless test ever, and all it did was increase test coverage. But it doesn’t do a damn thing about boundaries, what should happen with different types being passed into the function, what should/should not be allowed as types (something static type systems like Haskell/Idris/Ocaml get for free) etc…

                            1. 1

                              This is not a useless test! Based on the syntax, I’m assuming the language is Python. There are no compile-time or run-time checks that this code will not throw an error until you actually execute it.

                              What does * do when applied to two strings? Maybe you meant +?

                              >>> "x" * "x"
                              Traceback (most recent call last):
                                File "<stdin>", line 1, in <module>
                              TypeError: can't multiply sequence by non-int of type 'str'
                              >>> "x" + "x"

                              These “useless tests” can be thought of as double-entry bookkeeping in accounting. Sure, you’re absolutely certain that the code you meant to write would work. But as demonstrated in your example, even a single line method can have unexpected bugs.

                              1. 0

                                Yeah, meant +.

                                The only issue with this test is it really doesn’t do anything useful, my typo that i’m too lazy to fix aside, it doesn’t actually test anything of substance.

                                A bit better example in c, take a function that takes a uint8t and does a divide by 2 and also adds a uint8t that can exceed UINT8_MAX. Writing out all the unit tests in a TDD fashion for this is tedious at best, and as error prone as the code you write at worst. How do you handle overflow? Do you use saturating addition? How do you validate all values are right in this regime? etc…. TDD doesn’t help with the hard problems of logic here.

                                Property based tests here would be better, but I guess in the end I see TDD as promoted by most people as being equivalent to ∃ or ∄ versus ∀. This isn’t to discount TDD, but it is at best a step towards better software. I’d argue it is a very small step compared to things like theorem provers and dependently typed languages.

                                These all need work as well to become mainstream but I feel much better with things like property based testing and dependent types, or even things like smt solvers with liquid haskell than I ever do with TDD. TDD is too much like ad hoc polymorphism with OO languages to me. I’d rather either use straight c structs or go all in with a more expressive type system.

                                1. -1

                                  I believe you are making their point for them. Writing tests for cases that can be systematically excluded by a type system is a waste of time, and no matter how hard you flail at it you’ll never do as good of a job covering those cases.

                                  1. 1

                                    Type systems only guarantee that the input and output types are correct. They do not guarantee that the function does what you expected it to do.

                                    1. 0

                                      That’s obvious. What is your point?

                                      My reading of the example was that they 1) meant + instead of *, 2) were intending for foo to operate on integers, and 3) were showing an example of a useless test that checks the result of foo on strings, when that case would have been excluded by a type system.

                            2. 2

                              Note that the “famous study from Microsoft, IBM, and Springer” referred to (which in this article links to a paywall site) can be had for free from research.microsoft.com and was previously submitted to lobste.rs three months ago - https://lobste.rs/s/i3h7fj/realizing_quality_improvement_through

                              1. 1

                                People in this thread are missing the point:

                                • An experienced programmer who practices TDD will write better code than if they didn’t.
                                • A junior programmer who practices TDD will write better code than if they didn’t.
                                1. 2

                                  That might be the point, but the point is unproven if the metric is not valid.

                                2. 1

                                  I do a very simple thing in Ruby to give me “The Best of Both Worlds”.

                                  I have a few very very simple utility methods…

                                  def static_type_check( klass)
                                    raise StaticTypeException, "Static type check error, expected object to be exactly class '#{klass}', found '#{self.class}'\n\t\t#{self.inspect}" unless
                                        self.class == klass
                                  def polymorphic_type_check( klass)
                                     raise PolymorphicTypeException, "Polymorphic type check error, expected object to be a kind of '#{klass}', found '#{self.class}'\n\t\t#{self.inspect}" unless
                                        self.kind_of? klass

                                  I have a couple of other helper ones like those. eg.

                                  static_type_check_each (checks a collection) or 
                                  quacks_like (Does object .respond_to? method)

                                  Now TDD and any object oriented method has a nasty feature.

                                  The “set instance variable” in the constructor is lexically and timewise distanced from the use of an instance variable.

                                  ie. If the wrong thing gets bound to an instance variable when you construct an object…. the bug will not be on the backtrace when the shit hits the fan.

                                  Solution: In your constructor just assert you have the right thing.

                                  class Foo
                                     def initialize( _duck, _goose)
                                         @duck = _duck.quacks_like( :qvack)
                                         @goose = _goose.polymorphic_type_check Goose