1. 11
    1. 7

      The primary purpose of writing a test is to show correctness. There are lots of other secondary reasons, but they are definitely secondary.

      For example, it’s often said that the primary purpose of TDD / test-first development is to aid in design. This can’t be the primary purpose, because there are plenty of ways to get design feedback without tests. For example, we could just write code without any assertions on any values. This provides design feedback, there’s no need to test the results at all.

      Another secondary reason to test is to enable refactoring. The reason this is also secondary is because you can refactor anything you want without tests. Tests just help to show that the refactor preserves correctness. In order to show that changes preserve correctness, you first need to test for correctness.

      Although secondary, these are still beneficial.

      1. 8

        I’d swap those: for a typical test, maintaining correctness over time is a far bigger motivation than demonstrating correctness at a point in time.

        I gather that the majority of development happens when you write code first, convince yourself that it reasonably correct by running it, and then write tests on top to freeze the behavior.

        Sometimes you do use tests to demonstrate correctness, but this needs to involved randomized and generative testing in some form, so that the tests can actually discover cases which you haven’t thought about during implementation.

        1. 1

          I don’t agree with your last paragraph. In my view, tests aren’t (necessarily) about finding bugs, but verifying the software meets the requirements or spec. Through that lens, a “bug” is a deviation from the requirements, and when somebody finds one, the requirements/spec should be updated and the test added.

          In some cases, the tests are the spec or requirements.

          1. 2

            You are conflating a bug (deviation from spec) with misspecification (spec doesn’t describe desired behavior).

            If you’re changing the spec because of it, it’s not a bug.

          2. 1

            But my point is that to maintain correctness over time, you first need to demonstrate point-in-time correctness.

            I’ve never worked on a project that didn’t have dozens or hundreds of people modifying it every day. So of course correctness over time is extremely important. But to get it, you need tests whose primary purpose is to enforce correctness at any given snapshot.

          3. 4

            Hi, author here! I get what you mean, but in practice I find I’m not reaching for tests for correctness reasons per se. I think this is because it’s quite hard to prove correctness with tests. They can show existing behaviour very well, and there are certain kinds of tests that can show kinds of correctness (property-based testing, for example), but for correctness per se, I’d rather use types or API design to enforce correctness, rather than describe correct behaviour in tests.

            That’s not to say tests are unrelated to correctness: if I make a change and the tests still pass (and I’m confident in my tests), then I’m confident that my code is no less correct than before. I think Matklad’s description of “correctness over time” is a good description there. But that feels different from “if I test this, it will be correct”, which is what I’d expect if I was testing for correctness.

            And that idea is what brought me to write this post in the first place: I really like writing tests, I find them really useful, and correctness is related to that, but when I start up my test runner it’s usually not because I want to prove my code is correct. Instead, it’s usually because I want to see what the code I’m writing is actually doing, or to check that the change I’ve made isn’t changing how the code behaves. And I specifically wanted to describe that perspective (i.e. tests outside of just a correctness point-of-view).

            1. 3

              Hello. Great post. Thinking about why we write tests is very important, since that greatly affects the kinds of tests we write, and how many of them we write.

              I think what you’re getting at is that “correctness” is an overloaded term. On one hand, correctness is global and binary: the entire application is either correct or not. But on the other, a single test shows the correctness of a single behavior (a behavior being a sequence of states / a trace). There is a whole branch of verification devoted to this: runtime verification.

              Tests (and runtime verification) don’t show complete correctness. But the purpose of a test is to show correctness of the behavior being tested. I agree with you that complete correctness is rarely the goal, but when I say “correctness of a test” this means that the test is looking to verify some behavior, not all behaviors.

              1. 3

                Thanks! And thanks for the Wiki link, I can look forward to reading up on that.

                I think you’re probably right that we’re using correctness in different ways. To me, the danger of seeing tests as a tool for correctness is that one sees it in the global, binary sense: “my code is now correct, because I wrote some tests and they’re all passing”. But the local sense (this single behaviour behaves in the stated way), then tests do provide that guarantee of correctness (albeit with the danger that the test doesn’t state the behaviour we expect, e.g. because we’re just testing that mocks exist, and not testing that the program works).

                But saying that tests provide correctness then becomes almost tautological. The definition of a test is a statement of a specific behaviour that we can verify. If we swap in that definition, the blog post becomes “Why write verifiable statements of correctness for specific behaviours?”, and has mostly the same content. Or in other words: yes, tests show a kind of correctness, and we use that correctness during development and while refactoring in the ways that I talked about.

            2. 3

              For example, it’s often said that the primary purpose of TDD / test-first development is to aid in design. This can’t be the primary purpose, because there are plenty of ways to get design feedback without tests.

              That does not follow. Just because there are other methods to achieve something, doesn’t mean that a method that you think is for something else, can’t be primarily used for that first something.

              1. 2

                The primary purpose of writing a test is to show correctness.

                There is a difference between correctness and verifying that the code does what you are expecting. Most applicaiton developers that test, do it to as away to check that the code does what they intended. As ~matklad mentions in the other reply, correctness is a higher bar than that.

                1. 4

                  …correctness

                  …the code does what you are expecting.

                  Are the same thing

                  1. 2

                    Some tests verify that you correctly implemented an incorrect algorithm.

                    For example, maybe you want to implement some compiler optimization to simplify arithmetic–like x + 0 -> x, etc. One of your rules is x/x -> 1, but you didn’t think about what happens when x is zero at runtime–maybe correctness, in this language, requires throwing an exception.

                    If your tests only show example programs before/after the optimization, you will confirm that the optimization does what you’re expecting, but if you actually run the programs with different inputs you’d realize it’s incorrect.

                    1. 2

                      Correctness requires being able to define correct behavior. You’re talking about incorrectly defining your desired correct behavior (I call this misspecification).

                      There is no solution to that problem. To be able to test for correctness, you need the definition of correctness.

                2. 1

                  When fixing a bug, I totally agree that a test is to show correctness. I even like to make 2 commits for my PRs (some examples):

                  • one with the test added (that must make the CI fail)
                  • the second with the actual fix (which should make the CI green again)

                  However when building something new (like a new feature), I think the main aim is to prevent future regressions.

                  Maybe it boils down to:

                  • fixing something goes along with adding a test for this specific (unhappy) case
                  • whereas implementing a new feature usually means adding a test for the happy case.
                3. 1

                  I was thinking yesterday about the Gettier problem, and how testing (& CI) plays a large role in helping solve our G-problem. We want our justified true belief (the code works on “all” inputs), but we also want to reduce false premises (I’m testing the wrong branch, or I forgot to handle a wide range of inputs, or I’m accidentally reusing global state &c). Like checking for blind spots, it can seem a chore. You have to be careful to not fool yourself, which isn’t easy.

                  In that sense, testing/CI cut the fog of war (undefined behavior) to grant us knowledge (at a cost, of course). Code changes cause test changes, sure, and that’s an interesting problem. But I’m afraid I don’t quite understand what the author’s trying to say altogether.