1. 11
  1.  

  2. 8

    High line coverage doesn’t necessarily imply covering enough of the state space. (Branch coverage is only slightly better in this regard.) As the post notes, testing with groups of input that correspond to different equivalence classes is more meaningful, because it exercises the code under test in several different ways.

    There’s a problem hiding in plain sight, though: how often do we know all the edge cases?

    Property-based testing, fuzzing, model checking, static analysis, and symbolic interpretation help here: Property-based testing and fuzzing do guided random exploration of the state space and report edge cases (known and unknown). Model checking, static analysis, and symbolic interpretation reason about possible behavior within the state space as a whole (possibly with some upper size bound).

    1. 1

      If you know a line is run during a test suite, it tells you nothing about whether it’s well tested.

      However, if you know the line is not executed during a test suite, that does indeed tell you something about how well tested it is.

      I don’t work towards 100% coverage - I work away from having code I know isn’t tested. It looks like the same thing from a distance, but it’s an important distinction to me.

      1. 2

        Absence of coverage can definitely be a useful insight. Line coverage can be an absence of evidence, but negative line coverage is evidence of absence. :) It just seems to lose something in translation when that coverage becomes a metric that triggers failures in CI, though.

        (Also, if test input generated via property-based tests still aren’t covering parts of a program, it’s a sign the generator needs more variety.)

      2. 1

        I like to move this back to the root. Eliminate edge cases as much as possible.

      3. 4

        The author gives this as a bad example of 100% coverage (since it doesn’t check division by zero):

        public void divide_with_valid_arguments() {
          assertThat(new Calculator().divide(10, 2)).isEqualTo(5);
        }
        

        Yet it’s even worse than that, since this also gives 100% coverage:

        public void divide_with_valid_arguments() {
          new Calculator().divide(10, 2);
        }
        

        A test suite with 100% coverage doesn’t actually need to assert anything! It does check that there’s a way to run each branch without hitting a language error (undeclared variable, segfault, type/tag error, whatever), but many compiled languages will catch these things anyway, making code coverage even less useful there.

        1. 3

          Yeah, this is an issue. What would be nice is a way for the test framework to say “some kind of test or assertion was (not) executed against this line/branch of code”, rather than “this code was executed at some point while tests were running” and then ideally you could see exactly what those assertions were. I’m not aware of a way to do that in any languages I know (Python and JS). It’s easy to forget that executed doesn’t necessarily mean it was tested.

          1. 3

            It does check that there’s a way to run each branch without hitting a language error (undeclared variable, segfault, type/tag error, whatever), but many compiled languages will catch these things anyway, making code coverage even less useful there.

            Not necessarily! If you use code contracts, then it’s still worthwhile to write tests without assertions. That’s because you’re putting the assertions in the code itself, so all the test needs to do is explore the state space. Example:

            @require("`x` must be in `l`", lambda a: a.x in a.l)
            @ensure("`x` must be removed from `l`", lambda a, r: a.x not in r)
            def remove(l, x):
              l.remove(x)
              return l
            
            @given(lists(integers()), integers())
            def test(l, x):
              assume(x in l)
              remove(l, x)
            

            Even though there’s no assertions in the test, it will still fail, because remove([0, 0], 0) violates a postcondition.

          2. 3

            100% code coverage is a false metric, but I don’t think the given solution, parameterized testing, is any better. He said that his function inputs have four partitions, but he misses a huge number: instances where only one argument is negative, instances where the other is negative, 1e200, 1e-200, -1e-200, Inf, NaN, -Inf, -0…

            This is why testing is trending towards testing entire state spaces. @silentbicycle gave a list of different techniques and develops the PBT library for C.

            1. 2

              This article sounds like a lot of excuses and personal frustrations to me, more than it is actual reasoning to not aim for maintaining a high level of coverage.

              1. 2

                The problem is that bean counters like metrics, so programmers go along with code line test coverage, even if they know that the only test coverage that matters is over the input domain.

                1. 1

                  Just the other day I was mentioning Inozemtseva and Holmes, “Coverage is Not Strongly Correlated with Test Suite Effectiveness”, whose methodology is perhaps not perfect but I think still makes a great point.

                  1. 1

                    Code coverage % as a gatekeeper is nearly always a bad thing. Managed to delete a bunch of old code that happened to have higher-than-average coverage? You’re making the coverage worse. Now you’ve got to go and add a bunch of almost-certainly-useless tests to code that’s not related to your ticket, isn’t in your head, and maybe was written by somebody else.

                    1. 1

                      Some years ago I wrote about this topic too.

                      I still think that you should not even measure coverage if you don’t know what TDD is for.