1. 12

  2. 9

    Let me start by saying that I prefer a developer who tests too much than not at all. So I appreciate posts like this. That being said, I disagree with the premise of the post, and I belive the main issue is that OP ignores one crucial aspect: cost.

    Writing any code has a cost. That cost is a compound of many things.

    • time spent designing and writing it
    • the effort spent on future maintenance of the code (which, frustratingly, you don’t know at the time of writing)
    • the inevitable growth in complexity of the code base, which leads to other problems down the road
    • additional time spent on onboarding new developers

    and possibly many more things. When we write new code, we weigh these cost against the potential value, e.g. new features, security, etc. Testing code is no different. Any testing code has a value that we have to weigh against the cost of writing it. I like to frame that value as confidence. How confident am I, after having written this test, that my code is working as intended from the perspective of my users? Weighed against the cost of writing it. Using this mental model, you will find that a lot of tests are not worth writing.

    OP addresses common objections, amongst them:

    Enforcing 100% test coverage leads to bad tests

    In their answer, they kind of hit the mark in saying:

    Putting too much focus on one particular metric might lead to gaming behavior

    or put differently by Goodhart’s law:

    When a measure becomes a target, it ceases to be a good measure.

    OP further says:

    What’s sure is that it is straightforward to write bad tests.

    Unfortunately, they dismiss the notion by saying:

    I am not sure how that would lead to bad tests.

    In my experience, if you enforce any test coverage (you don’t even need 100%), developers will write bad tests. And many of thoses tests will have negative value. That is, they don’t provide enough confidence to offset their cost.

    I have experienced this first-hand. At work, we have a code base that had ~100% code coverage at one point in time. In my estimate, 95% of those tests are worthless. How do I know? You can change the code in a way that does not break the interface of functions, nor break user behaviour, and the tests will fail (they should not). At the same You can deliberately break the code in meaningful ways, and the tests will not fail (they should).

    Test coverage is one of the tools at our disposal, it is one metric, not the metric. Use it to discover meaningful places to test. Don’t kid yourself when you are reaching for 100%. When you do, you are playing a video game, you are going for a high-score. You are not meaningfully improving your code.

    1. 1

      I totally agree; the cost of tests shouldn’t be ignored.

      I imagine the article author might respond, “just put # pragma: no cover comments in all the code that isn’t worth testing”. However, I think they would end up with quite a lot of those comments, all over their code. That would be bad—comments that are not directly relevant to the functionality of surrounding code harm the readability and editability of that code. Such comments need a strong benefit elsewhere to be worth it. And as you explained, enabling a coverage metric is not a strong benefit.

    2. 9

      Meh, I often drive my tests by code coverage. I often design the code (especially error paths) to be explicitly code-coverageable. For example, not covered open() error in the code? easy. Add –testonly-break-open command line option to the program to go there (fault injection). Fault injection can be achieved in other ways, for example “strace” allows me to fail some syscalls (mmap!).

      Having said that, I’m not orthodox. There is little point in testing non-hot-path things, for example do I really need code coverage for –help printing code? Or for malloc errors in all paths? Some things can fail and while I want to see a warning, I don’t really care if it’s code-coveraged (madvise for example).

      Perfect is the enemy of good.

      1. 7

        The author agrees that you don’t need to test literally everything, but argues that developers should be explicit about what is excluded from testing. Thus 100% coverage should be achieved by either testing or explicitly NOT testing each line.

        The author even has recommendations for which things should usually be excluded from testing:

        What can/should be excluded from coverage? It depends on your codebase and your quality expectations. Also, as for any rules, there are exceptions…

        • Simple error handling
        • Proxy functions
        • Code as configuration
        • Simple early return/continue
        • Code that is too difficult to test (rare)
        • Debugging/manual test code
        1. 3

          I often drive my tests by code coverage

          Careful about that, I was maintaining some tests written by a colleague who was doing that.

          Alas, it was a horror.

          Why? Because the reason for test cases and assertions were based on a branch in the implementation, not on a required behaviour.

          And if it failed? What was it testing? Why was it that value asserted?

          Not a clue.

          It didn’t help that it was a ramble on test and violated the principle of testing a single required behaviour per test case.

          After that nightmare I became even more keen on “Given blah When foo Then …” format for test cases.

        2. 6

          As somebody said, (approximately), bugs decrease the value delivered.

          If you have n% coverage, a crude estimate is you delivering (n+(100 - n)*a)/100 of the value where ‘a’ is a fudge factor representing the amount of value that exists from uncovered code. (Somewhere between 0 and 100%)

          I agree branch coverage is more interesting /useful than line coverage.

          I would actually say state space coverage, in particular the corners and edges of the invariants in state space are the interesting points.

          Mutation testing is a very interesting and compelling notion.

          1. 8

            You also need to consider value delivered per unit effort.

            Writing tests are not cost free. At some point there’s diminishing return (the fudge factor is not constant throughout the code). Towards the end the value/effort ratio is depleted.

            Not to mention dev morale is crushed at 100%. Nobody likes the code-police patrolling the streets ready to punish anybody who doesn’t follow the rule and get 100%. At 80%, devs have movement to cover what he thinks is important.

            And saying “100% of some percentage” is basically moving the goal post in circle to the same position. If I wasn’t going to cover it anyway, and by the rule i can go with 0% coverage, why can’t I just partly cover it anyway, as that would be better than 0%. At which point you are back to square one, with some code @ 0<x<100 coverage.

            Code coverage is a middle metric. What you really want is no high severity bugs in production. So just use that instead of a number that’s easy to push around but doesn’t. The way people brag about their code coverage soon somebody will figure out how to get 120%.

            1. 2

              I feel like a lot of code testing regimens are a cargo cult, people don’t understand the tradeoffs of adding more testing and what the actual purpose of it is

              1. 1

                Writing tests are not cost free. At some point there’s diminishing return

                Mostly agree, except I will note that if the code is hard to test…. it’s a code smell telling you a small refactoring will make it more testable and as nice side effect, more decoupled and hence more understandable / more resuable / less brittle.

                Often I will look at code and say, hmm, I can’t test that, because I can’t think of a way of stimulating that behaviour through the API…

                What if I ban that behaviour, and say, I will explicitly demand that that case should never happen…?

                Now how does that change my API? What do I need to make a precondition of the API?

                Often things crumble and become a lot simpler. I may need to handle a case on the client side, but the sum of the clients and the implementation often become simpler.

            2. 4

              The title worried me, but I found myself agreeing with the idea. Not having tests needs an explanation (even if it’s a bad explanation).

              1. 1

                100% of what? All possible execution paths through the application? Doesn’t sound feasible, most of the time.

                1. 2

                  100% of what?

                  Don’t just read the headline. If you click through to the actual article “Test coverage only matters if it’s at 100%”, you will find the answer to your question.

                  In short:

                  100% of the lines that should be covered (not the lines that are written) … You need to be explicit about excluding lines from coverage.

                  If your language supports it, make sure you test all branches

                  Now that you actually know what the article says, do you still think 100% coverage is infeasible? (My opinion matches that of @ts.)

                  1. 1

                    Does “all branches” mean each branch independently, or each possible execution path? One sounds feasible, but inadequate, the other sounds adequate, but infeasible.

                2. 0

                  I explicitly NOT testing any line. 100% achieved.

                  1. 0

                    This is a rather boring response. Obviously, any voluntary, gradual opt-out methods has the option of opting out completely.

                    Not doing coverage testing is fine and and okay decision, especially when done fully consciously.