1. 17

Using Design-by-Contract to identify the reason for a test failure with “laser focus.” Includes code examples.

  1.  

  2. 4

    TDD and DbC work extremely well together.

    I always leave my pre-conditions and invariant checks in my code.

    Preconditions act like test assertions on my tests. (Test have bugs too).

    Post conditions are somewhat difficult beasts.

    A complete post condition is usually about as complex as the implementation and often harder to write.

    What he has there is a weakened postcondition, that will detect a (hopefully useful) subset of errors.

    Again, leaving that in your production code while doing TDD is useful.

    Where TDD wins completely is where the postcondition is so hairy, your only handle is to use an oracle.

    ie. For certain simplified fixed inputs, you know exactly what the answer is…. So you assert those in your unit test.

    But there is no reason why you can’t do both TDD and DbC. They make an extremely powerful combination, and the advent of fuzzers like afl allow you to gain even more value out of your DbC checks.

    I strongly recommend, from my own experience and practice, doing both (and then layering fuzzing on top)!

    1. 3

      Original author here. You are totally spot on that tests are superior where postconditions are hairy. However I disagree that TDD is orthogonal to DbC. My argument is you don’t need TDD with DbC. You should use higher level system and regression tests with DbC. TDD would be a waste of time and money with DbC. Because ultimately TDD is about design, and DbC is also about design. I would wager the latter makes a superior design because you end up with a much stronger model (the code) of what you are trying to do because it will execute on real data with real people. Unit tests decidedly don’t.

      1. 2

        I respectfully disagree.

        My experience has been that there is a very useful synergy between the two.

        Done well, DbC get’s you more power from your unit tests for less effort, and unit tests gives you instant feedback and value from your DbC effort.

        TDD is partly a human level thing.

        • A way of getting yourself unstuck.

        Ask yourself, what is the next compelling test case that will force me to change my code?

        • It’s a way of maintaining mental flow via a stream of tiny Monkey level rewards…

        That thing I just did worked, I can with my mind at ease, move on to the next thing.

        • It’s a pin on the required behaviour (NOT implementation) of a component allowing safer refactoring and hence cleaner code.

        (Note: If your unit tests break every time you refactor your implementation, you’re doing it wrong.)

        • TDD is by an two full orders of magnitude (if done right) the fastest compile / link / run / debug cycle available.

        • It’s a way of improving my design, TDD forces you to create a testable design.

        A testable design is a very lightly coupled design with small dependency fan out, with larger dependencies injected and a low amount of internal state.

        A testable design I would argue is a Good Design.

        Where my definition of Good is “How little do I need to read and understand before I can make a correct change to this system? How hard is this component to reuse?”

        A lot of that definition of Good Design comes down to coupling and cohesion, where cohesion is Good, and Coupling is Bad. And connascent coupling is the hardest to see and the most noxious.

        Reusable designs expose and force the removal of connascence.

        You are quite correct there is a very strong overlap with DbC, but the drivers are not exactly the same. Both TDD and DbC are forces driving your code (in subtly different ways) to a better design.

        Certainly a properly DbC’d design is easier to reuse than a non-DbC design… but there is a very important difference.

        Your Unit Test is the first Actual Reuse of a component.

        Until it has actually been reused it at least once, a claim to be writing re-usable code is vapourware. (I would argue you have to re-use it at least three times before I would have much confidence in the claim.)

        Now add in fuzzing to the mix, which has massive synergies with both Unit Testing AND DbC… and you have a great design and test process.

        1. 3

          I would like to address your point about coupling and cohesion. I completely agree that high cohesion is essential for a good code. However, it is a common lie told in our industry that coupling is bad. Coupling is not inherently bad. If you take the idea of low coupling and take it to the extreme, you end up with software made of pieces that don’t talk to each other. In reality some software components require high coupling and some can have low coupling. It really depends on what relationship those components have at a system level. In fact I’m on Alan Kay’s side here when he says that what computer science is is a science of systems. Not algorithms, not data structures, or functional programming, or whatever. We are building systems and you simply cannot have a system if all components have low coupling. The real answer to whether there should be low or high coupling between components is “it depends”. In other words what are the relationship between pieces? TDD does a very good job of pushing you towards low coupling and I think this is a huge mistake and you end up with a terrible design at the system level.

          1. 2

            I find the best way to think about “Appropriate Coupling” is knowledge.

            Ask the question “Should this thing over here know about that tiny detail of that thing over there?”

            If you shrug and say, yup, it obviously needs to know that to do it’s job. Then I probably won’t have any problems with your coupling.

            If your gut feel says, “Errrr, this really has nothing to do with that, but that was a kludge we made so we could meet the deadline that somebody who knows nothing about the code imposed on us.”

            Then you have inappropriate coupling and need all the help you can get to stop it.

            The notion of connassence is a very useful way of formalizing that gut feel.

            If things are they way they are “because they grew up together”, that’s Bad Coupling.

            It’s like a Rudyard Kipling “Just So” story.

            Why is it coupled? Ah.. some long complex history, none of which applies today. It’s “Just So”.

            If things are coupled because it is required to function. That’s Appropriate Coupling.

            A hint that your TDD is going the wrong way is if you’re reaching for dependency injection frameworks all the time. Dependency Injection frameworks are a powerful tool of last resort.

            Good Design has the least amount of coupling that is appropriate.

            Why? Because industrial scale systems grow far far far larger than humans can cope with.

            If you dealing with mickey mouse systems that fit in your head (and/or you (and all your colleagues) have a very very big and powerful head(s)) ignore everything I say.

            Everything I say is about coping with systems that grow too large to fit inside my little noddle.

            I suspect you might like DHH’s “Test Driven Design Damage” rant. But I caution you to listen to the panel discussion he had afterwards with Kent Beck and Martin Fowler. https://www.youtube.com/watch?v=z9quxZsLcfo

            The TL;DR; from that is if your code is a thin veneer on top of a large framework like rails… and your business model is spinning up rails instances with little functionality and little maintenance work…. Yup, you and DHH are right.

            However, the more functionality you add on top of your framework, and the more interfaces you have into that functionality that are not via that framework and the more you are required to maintain and enhance that functionality… more value there is in that decoupling TDD forces on you.

    2. 2

      This blog defines the difference between TDD and Design By Contract as a matter of whether errors are returned or thrown/asserted uncaught.

      Where does this end though? At some point you accept user input that may not meet invariants you’ve specified, and your software probably shouldn’t just die every time.

      There is a lot more nuance, but I’m not sure how to make it simple.

      1. 6

        I’d simplify it as “contracts are a property of your code, while TDD is a testing technique.” They’d be orthogonal.

        TDD just boils down to “Write your tests before your code”. While agile gurus think it’s only unit tests, there’s nothing stopping you from TDDing with property tests, fuzzers, etc. Using contracts would actually make TDD more useful, since you can use the contracts to generate your tests!

        At some point you accept user input that may not meet invariants you’ve specified, and your software probably shouldn’t just die every time.

        I’d imagine you’d only pass parsed/validated input to your functions with contracts. You should be isolating user input anyway; contracts just give you an extra sanity check.

        1. 4

          I’d imagine you’d only pass parsed/validated input to your functions with contracts. You should be isolating user input anyway; contracts just give you an extra sanity check.

          Yes, that’s the idea with DbC. Each model class provides adequate queries (non-side-effecting methods that return a value) for checking the validity of the data it needs to work with. Those queries are used in the preconditions of the class’s commands (side-effecting methods that don’t return anything but mutate local state). So, it would be the responsibility of the caller to make sure the precondition is satisfied before calling a command; and in turn, it’s the responsibility of the callee to ensure its postcondition holds given valid input.

          With that in mind, you do DbC in your core model classes, and in your outward facing interfaces you’d do defensive programming (i.e. using if statements or switch statements) to validate user input and only call the models when the input is valid.

        2. 4

          At some point you accept user input that may not meet invariants you’ve specified, and your software probably shouldn’t just die every time.

          For that point (for discussion purposes let’s call that point the function “sanitize”) your precondition assert is the loosest possible “assert(true)”

          ie. You must accept all inputs the world can throw at you.

          The postcondition for “sanitize” must be strict.

          Now anything downstream of “sanitize” can and should have much stricter preconditions…. but those preconditions must be supersets of “sanitize”’s postconditions.

          ie. Sanitizes postconditions imply that the downstream preconditions will hold.

          You then hit sanitize with targeted unit tests and fuzzers and make damn sure it does it’s job.

          Because if it doesn’t, your only real remedy is to deliver a fixed version of the software. So the sooner you discover that, the better.

          Think about it…

          If “sanitize” allows unclean inputs through… all bets are off.

          The result may be a crash, or worse, subtle sporadic unreproducible errors, or worse subtle data corruption, or worse a security breach.

          No matter which of those horrid choices eventuate… the symptoms will only be apparent way downstream of the cause (sanitize failing to do it’s job), and usually out in the field on the customers site.

          If your precondition assert kills the system displaying a stack trace…. you can find and fix the bug before the software leaves the test rack.

          1. 3

            Author here of original post, you hit the nail on the head. I have a much older post describing this though you did an excellent job. I call this writing software in the shape of an egg. I feel this is a simple and highy effective technique that has simply been lost in the commercial development space. It works for any kind of software, from GUI to server.

            1. 3

              Ps: While I disagree with you that DbC replaces TDD…. Please don’t let that tiny disagreement dissuade anybody.

              /u/mempko is entirely correct that DbC is extremely important and his post are adding great value to the conversation.

              PPs: You’re right about the bugs being “between the units”, but the solution isn’t to throw the TDD baby out with that bath water.

              Use DbC to create what JB Rainsberger calls Collaboration Tests.

              Here is a snippet from a talk I give my colleagues… Pay particular attention to the part on services.

              They may all be functions…. but Pure, Stateful, Services and I/O functions are very different sorts of functions requiring very different sort of tests.

              Pure Functions

              • A function that modifies nothing and always returns the same answer given the same parameters is called a pure function.

              Stateful Functions

              • Stateful functions results depend on some hidden internal state, or as a side effect modify some hidden internal state.

              However, unlike a service, if you can set up exactly the same starting state, the stateful function will have exactly the same behaviour every time.

              Services

              • A service function is one whose full effect, and precise result, varies with things like timing and inputs and threads and loads in a too complex a manner to be specified in a simple test.

              Testing PURE functions

              Pure functions have many nice mathematical properties and are the easiest to test, to analyse, to reuse and to optimize.

              Actively move as much code as possible into pure functions!

              “const” is a keyword that always make me relax and feel less stressed. “const” is a remarkable powerful statement. Use it where ever possible.

              Tests of pure functions are all about the results, never about functions they may invoke. ie. Never mock a pure subfunction that a pure function may use for implementation. Use the real one.

              Testing Stateful functions

              Often a stateful function can be refactored into a pure function where ….

              • the state is passed in as a const parameter, and…
              • the result can be assigned to the state.

              The best use for stateful functions is to encapsulate a bundle of related state (into a class). These functions (or methods) should guarantee that required relationships (class invariants) between those items are ensured.

              Where you have a collection of functions (or methods) encapsulating state (or a class) the best unit testing strategy is…

              • Construct the object (possibly via a common test fixture).
              • Propagate the object to the required state via a public method (which has been tested in some other test)
              • Invoke the stateful function under test.
              • Verify the result.
              • Discard the object (possibly via a common tear down function).
              • Keep your tests independent, DO NOT succumb to the temptation to reuse this object in a subsequent test. Fragility and complexity lies that way.

              The best measure of the “Goodness” of the unit test suite is how well does it do at Defect Localization.

              ie. Very few tests should fail if you have introduced a bug, and by just looking at the test case name and stack trace you should be able to tell you exactly where what is broken.

              DO NOT assert on private, hidden state of the implementation, otherwise you couple your test to that particular implementation and representation, rather than the desired behaviour of the class.

              Testing Services

              Testing services is all about testing interface specifications . The services dependencies (unless PURE) must be explicitly cut and controlled by the test harness.

              We have had a strong natural inclination to test whether “client” calls “service(…)” correctly by letting “client” call “service(…)” and seeing if the right thing happened.

              However, this mostly tests whether the compiler can correctly invoke functions (yup, it can) rather than whether “client” and “service(…)” agree on the interface.

              Code grown and tested in this manner is fragile and unreusable as it “grew up together” (Connascent coupling). All kinds of implicit, hidden, undocumented coupling and preconditions may exist.

              We need to explicitly test our conformance to interfaces, and rely on the compiler to be correct.

              When testing the Client….

              • Does the client make valid requests to the service? (Use the exactly the services preconditions to check this! Pull them out as a separate function “service_name_pre()”)
              • Can the client handle every response from the service permitted by the interface?

              When testing the Service….

              • Can the service handle every request permitted by the interface? (Fuzz the inputs, you should either get a precondition failure or a pass, nothing else!)
              • Can the service be induced to make every response listed in the interface specification?

              Think of the services post conditions as a region of the output space. Can you induce the service to generate points scattered over a substantial portion of that space (especially boundaries and corners)? Can your client handle all those points?