1. 4
  1.  

  2. 4

    I have a couple of responses. First is an issue with this:

    Tests can’t prove that the program has no bugs, only that it has bugs (which also can be bugs in tests itself).

    This is true of every verification technique. Even if you’re formally proving the code is correct you’re still depending on your proof assistant being sound, and that you’ve specified all the properties you actually need.

    Example: I ran a theorem prover showdown a while back and one of the challenges involved a comparing sums of lists. All of the participants’ proofs (including mine) worked fine… then one of the Adacore people pointed out we’d all assumed unbound integers so our programs could potentially overflow.

    The other response is we have to be careful to recognize that unit tests aren’t the only form of automated testing.

    1. 3

      To me testing (automated/manual, unit/integration, etc.), type systems, verification, assertions, linting, etc. all affect the confidence I have in a system/program. That’s why I cringe whenever I see claims like “types make testing obsolete”, or “I have enough tests so I don’t need types”, etc. In fact, I even take issue with phrasing like whether/how much testing is “needed”; since there’s usually no actual requirement to do any (or, if there is such a requirement for certain domains, then checking the relevant regulations will be more prudent and informative than going with the opinions of some random internet stranger ;) ).

      It all lives on a spectrum: there are no well-defined lines telling us how much testing is “enough”, and of course the qualities of a test suite are important (e.g. whether they cover the important/tricky parts, etc.). If I don’t feel confident enough about some code, I’ll happily use whichever method increases my confidence in the most efficient way. Usually that means adding some tests, since test suites are often too small rather than too large. Yet any method gives diminishing returns: we can’t gain much confidence by adding tests to a heavily tested program (especially if we’ve prioritised high-impact areas like end-to-end tests, regression tests, etc.). We might gain more by focusing on, say, types instead, e.g. to ensure user IDs are distinct from order IDs, that SQL is distinct from user input, that invalid states can’t be represented, etc.

      The same can happen the other way too: whilst types tell us about all cases, rather than just testing individual ones, there’s only a certain amount of type-level computational trickery we can indulge in before it’s more informative to just run the darned thing and make sure our assumptions hold up.

      Other factors can affect what counts as “enough”, e.g. if something has little chance to cause damage then it might be fine to just run and see what happens; if we’re designing something for specialist/technical users to interact with, we might greatly increase our confidence by guarding certain actions with assertions, so that we crash with an informative message rather than doing the wrong thing; if something’s meant to last for a long time, we might code more defensively and test more of our assumptions; etc.

      Experience has taught me to always be pessimistic when it comes to estimating confidence, and to keep the path to testing/typing/proving clear of obstacles (e.g. separating calculations from I/O, having small clearly-defined functions/methods, having well-defined data structures rather than piles of key/value nestings, etc.), even if I don’t yet plan to use those things. Even if I don’t think I need them, this can make them cheaper options if I need more confidence further down the line (e.g. pulling some piece of functionality out of a bash script and writing it in Haskell).