1. 14

I sometimes see the claim that with a sufficiently strong type system applied properly, there’s almost no need for unit tests and less need for automated testing in general: if the code compiles, there’s little opportunity for it to be incorrect.

That’s appealing and I can see how it’s true in some cases. But I struggle to see how it’d really apply to some of the projects I’ve worked on. I’m curious to hear stories from people who’ve become convinced of the truth of that claim. Were there any resources that helped you get there? Anything that finally made it click for you?

Here are some examples of styles of tests in my current code base’s test suite that I can’t see how I’d represent as compile-time type checks no matter how sophisticated the type system was. For context, this particular code base uses a CQRS-style, event-sourcing architecture, and it’s part of a payment service.

  • Can all the historical versions of all the events be deserialized correctly from the event store, getting converted to newer versions (or to completely different event types) as needed?
  • Does the XML we send to a third-party service validate successfully with their schema?
  • Is timeout logic executed when expected user actions don’t occur in the required amount of time?
  • Do our fee calculations produce the values listed in the examples in our product spec?
  • Do we correctly throttle our outgoing requests when a remote service starts responding slowly?
  1.  

  2. 13

    I wouldn’t say that strong typing removes the need for unit tests. You can have a well-typed function and a langauge that enforces that the function is only called with well-typed inputs; and still have that function perform the wrong business logic for your application without a unit test checking this.

    Can all the historical versions of all the events be deserialized correctly from the event store, getting converted to newer versions (or to completely different event types) as needed?

    Let’s assume that your event store stores values as some very generic format, like a list of bytes, and that your event type is some kind of complicated enum. Your deserialization function is then something like List byte -> Optional EventType - Optional, of course, because it doesn’t make sense that for every possible list of bytes, there will be a valid event value in your program that the bytes correspond to. The bytes comprising the ASCII-encodes declaration of independence are a well-typed input to this function, just as the actual bytes in your event store are. So you still need some way to check that you’re doing the business logic of decoding your bytes the right way. A unit test seems perfectly appropriate here. You might even want to have the ASCII-encoded version of the declaration of independence in your unit test, to be sure that your function returns None instead of trying to parse this as an event in your system for some reason.

    1. 4

      So, I do agree that type systems can never fully replace unit/integration tests; they can catch type errors but not logic errors.

      However, a good type system can turn logic errors into type errors. A great example of this is returning a reference to a local variable in a language without memory management: in C or C++ it’s completely legal (maybe with a warning), in Rust it’s a compile error. This isn’t unique to memory management: in C, char + float is a float; in Haskell (and most other functional languages, including Rust), adding Char to Double is a type error. One last example: I’m writing a Rust wrapper for a C library. Every C function I call can return PARAM_INVALID if the allocated buffer is null. The Rust function doesn’t even document the error because it’s not possible to have a null reference in Rust’s type system (also not unique to Rust, C++ has references too).

      My long winded point is that even though you always need tests, if you have a good type system, there are less things to test.

      1. 4

        Curry-Howard Correspondence says that types ARE logic, so they definitely DO catch logic errors.

        1. 3

          That requires a powerful enough type system to represent a proof. Theoretically this is possible, and there definitely is value in using dependent typing and formal verification tools. But at the moment, with typical programming languages, only limited parts of the business logic can be represented with types.

          Even today, with a bit of discipline, it is possible to make certain states impossible to represent in the program. This allows you to eliminate some unit tests, which is definitely a good thing, but we’re still far from the point of proving all of a program’s logic with the type system.

          1. 1

            I understand they don’t catch all errors because there are some you can’t encode in your logic, I’m just pointing out that “type errors are not logic errors” is totally incorrect!!

          2. 1

            I’m not terribly familiar with Curry-Howard, but Wikipedia says it states that

            a proof is a program, and the formula it proves is the type for the program

            I don’t see how that means that types can catch logic errors: the type is a theorem, not a proof. Furthermore, just because you can formalize code as a proof doesn’t mean it’s correct; mathematicians find wrong proofs all the time.

            1. 3

              If you declare some argument is a String and your code compiles, then the type checker has proven that the argument can only possibly be a String. Passing a non-String argument, as you could do in a dynamic language, is a logic error. You violate a precondition of your function, which only works on strings.

              1. 1

                Type checkers are logic checkers so you can’t really screw up your proof, only the theorems. Yes, this happens sometimes, but it IS a logic system.

                1. 3

                  I think a better phrasing of skepticism is to ask what there is that can check whether you proved the right things.

                  Whether it’s done with tests or with types, at some point you are relying on the programmer to provide a sufficiently-correct formal specification of the problem to a machine, and if you declare that it should be done via types because the programmer is fallible and types catch things the programmer will mess up, you potentially trigger an infinite regression of types that check the code, and meta-types that check the types, and meta-meta-types that check the meta-types, and so on.

                  (of course, this is all very well-trod ground in some fields, and is ultimately just a fancier version of the old “Pray, Mr. Babbage…”, but still a question worth thinking about)

            2. 2

              See also https://www.destroyallsoftware.com/talks/ideology which goes much more into depth on this.

          3. 10

            Types help reduce the burden of unit tests. I believe this without a doubt.

            The reasoning isn’t so much that a type replaces a test but instead that the discipline of types helps minimize the number of possible states of the system and helps define a limited set of inputs and outputs. In other words, types do something a little orthogonal to testing: they minimize surface area.

            The upshot is that you over time discover that a lot of your unit testing effort ended up being burned on handling edge states and semi-appropriate inputs. Those tests just vanish—there’s no need.

            That said, many other behaviors you’d want to validate with a unit test will remain. Types can help clear up interfaces to make property-based/fuzz testing more appropriate, though. YMMV.

            The approach of trying to encode more invariants into the types themselves is super interesting. It’s clear that you can use this method to outlaw even more obviously bad behaviors, but in most languages there’s a hefty cost for introducing these techniques. They’re much more complex and can cause mental model load in working out a program that satisfies the types. On the other hand, there’s a lot of satisfaction in playing “type tetris” where the moment you assemble a program that checks, you have some confidence that it’s sane (if not totally correct).

            Another thing you may notice as you make the transition is that type-driven development is probably 2-3x faster than test driven development while obtaining similar results. You can often fluently describe a whole interface, library, or even program just considering the types while obtaining a very decent skeleton of the program. You can iterate through designs extremely quickly since (a) the type checker is very fast and (b) all you’re writing are specifications, not tests and implementations.

            Finally, most critically, we often rely on tests as a safety net to enable refactoring. Types don’t fully replace this use case either, but they make refactoring so much easier and joyful that it boggles my mind to consider refactoring without types nowadays. With a good type system, you just make the breaking change, hit compile, and then work through the list of 5, 10, 60 places that things have broken. It’s nearly mindless, as it should be.

            1. 9

              I worked in haskell on a system not too far from the one you describe (also CQRS based). There is definitely a fine balance to find between encoding invariants into the type system and adding some tests. Sometimes, it’s possible to have the compiler do the work for you, but it’s not always desirable. The two main drawbacks are the increase in compile time and the steeper learning curve for this bit of the codebase.

              For the events, we had many issues with backward compatibility for them. I believe tests may be faster and simpler to maintain. Another approach would be to encode the version in the event type, that’s quite easy. But then you need to have some custom logic to convert each version into a different type that is then used once you’re past the boundary of the DB. It’s definitely doable, but it’s quite a lot of work.

              For all the other specific questions you’re asking, I believe you’ll be better off with some testing instead of trying to put all of that in the type system. Also, I have no idea if it’s even possible when it comes to time (timeout, throttling).

              The article mentionned by u/iocompletion (https://lobste.rs/s/olecii/experiences_moving_from_tests_strong#c_5w8nap) is a very good read for a middle ground approach which I believe can get you very far.

              1. 3

                I really enjoyed this talk about how types and tests complement each other: https://lobste.rs/s/hiwykl/types_testing_haskell

              2. 7

                This article is very relevant to your questions: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

                1. 3

                  Thanks. I found that article’s worked example a bit too trivial, but it linked to a paper that has a more complex take on the topic, which I’m currently digesting: https://kataskeue.com/gdp.pdf

                2. 2

                  By definition a type system can only check properties of the code. The bullet points all refer to information which is not in the code. So the question is: How do you bring this information to your code?

                  Tests are one option. You essentially exercise the code with examples. The obvious downside is that examples never cover all cases.

                  Another option is to have a very precise machine readable specification. There are languages which allow you to phrase specifications as types for your code: Coq, Isabelle, Lean, Agda, … You proof your code correct.

                  I don’t believe it is viable for “a CQRS-style, event-sourcing architecture, and it’s part of a payment service” yet. This is still a frontier of formal methods. There are easier and more pragmatic steps in that direction than type systems. Amazon does some stuff with model checking, for example.

                  1. 1

                    I don’t think it would help with backwards compat serialization. Unless your types somehow encode the history of your git repo.

                    It can help with XML stuff in the sense that you could codegen from the schema and then if it compiles and your XML serialization is correct, you are alot better off than a model where you write tags and attributes directly. I feel like this is a good use of types.

                    You could encode timeouts into the type system to guarantee that a api client implements a timeout feature. I think types can improve help here but they aren’t a substitute for integration tests.

                    1. 1

                      It’s certainly possible to reduce your dependency on unit tests through strong typing, but there’s a bunch of other stuff that has to happen at the same time. You’ll always need ATDD/BDD type tests, though, so those don’t go anywhere.

                      I wish I could condense all of the lessons you have to learn into one pithy post. Alas I cannot. You’ll probably need to start thinking differently about your code. That will take a while. Mutability and having the type control program flow with no dead end paths are the big two things to grok. Then there’s error-free/no-crash programming. A few other things.

                      Good luck! Tell us how it went!

                      1. 1

                        the claim that with a sufficiently strong type system applied properly, there’s almost no need for unit tests

                        You shouldn’t take this claim seriously 🙂 No static typing proponent is seriously claiming that.

                        if the code compiles, there’s little opportunity for it to be incorrect.

                        This greatly depends on the quality of the type-driven design of the code. Many pieces of code that make strong type-driven guarantees, are able to do so by carefully designing their types in very specific ways and then also enforcing that the types be used in specific ways. It’s very much in the hands of the programmers who write the code. They could make a statically-typed spaghetti mess if they’re not careful.

                        Can all the historical versions of all the events be deserialized correctly from the event store, getting converted to newer versions (or to completely different event types) as needed?

                        Yes, provided you carefully design the events in the store and the types that handle them. Rough sketch of the types:

                        type EventV1
                        type EventV2
                        
                        function decodeEventV1(input: string): Result<EventV1, Error>
                        function decodeEventV2(input: string): Result<EventV2, Error>
                        function eventV1toV2(eventV1: EventV1): EventV2
                        

                        As you can see, with this design you can decode any V1 events and then convert them to V2 events. It’s not magical; it’s just a design that uses types to tell what can be done.

                        Does the XML we send to a third-party service validate successfully with their schema?

                        Yes, generating or writing codecs against a schema is one of the best applications of type-driven design nowadays. There are codegen tools like the Thrift compiler that actually generate code that can correctly handle the data given a schema.

                        The others are more runtime behaviour, so type systems in mainstream languages can’t really help you much there.