1. 64
  1.  

  2. 22

    What a great writeup. Here’s why it hits the nail on the head. TDD on its own is great. It gets people talking about the design of their code. It gets people caring about correctness. But the “TDD maximalist” position is unhelpful, arrogant, annoying, and flat out incorrect.

    Uncle Bob’s “TDD is double entry bookkeeping” doesn’t make any sense. The reason double entry bookkeeping works is because accounting is zero-sum. Money must move between accounts, because it’s a finite resource. If you write a test, code can be written to pass it in an infinite number of ways. Computation is not a finite resource. The analogy is not good.

    The TDD maximalist position that “TDD improves design because the design ends up more testable that way” is the definition of circular reasoning. I loved the point about long functions sometimes being the better abstraction - this is absolutely true in my experience. Artificially separating a concept that’s just inherently complex doesn’t improve anything objectively, it just gives someone with an obsession over small functions an endorphin hit. The same is true of dependency injection and other TDD-inspired design patterns.

    I know tons of people who have done TDD for years and years and years. As mentioned in the article, it has a lot of great benefits, and I’ll always use it to some degree. But in those years working with these other people, we have still always had bug reports. The TDD maximalist position says we were just doing it wrong.

    Well, if a methodology requires walking a tightrope for 5 years to get the return on investment, maybe it’s not a perfect methodology?

    1. 20

      Uncle Bob is to software what Sigmund Freud is to psychology

      1. 25

        I don’t think this is right: while Freud has largely been discarded in the details, his analytical approach was utterly new and created much of modernity. Uncle Bob … well, the less said, the better.

        1. 2

          Just read Otto Rank.

      2. 9

        While I generally agree with your take, I’m going to quibble with part of it:

        The TDD maximalist position that “TDD improves design because the design ends up more testable that way” is the definition of circular reasoning.

        This is not circular reasoning, though the inexact phrasing probably contributes to that perception. A more exact phrasing may help make it clear that this isn’t circular:

        1. All else equal (and I’m not sure what that might mean when discussing design), a more testable design can be considered a better design.
        2. A way to ensure that the design is more testable is to require comprehensive tests, and reject designs where comprehensive tests are hard to achieve.
        3. TDD is a practice which generates comprehensive tests, and which rejects designs that cannot be comprehensively tested.
        4. TDD therefore improves this dimension of software design.

        Put this way, this may be less controversial: it allows room for both those who subscribe to TDD, and for those who object to it. (Possible objections include “all else can’t be equal” – I think DHH’s comments on Test-induced Design Damage fall in this category – or that other practices can be as good or better at generating comprehensive tests, like TFA’s discussion of property-based testing.)

        1. 10
          1. All else equal (and I’m not sure what that might mean when discussing design), a more testable design can be considered a better design.

          This is why the reasoning is circular. This point right here can be debated, but it’s presented as an axiom.

          1. TDD is a practice which generates comprehensive tests, and which rejects designs that cannot be comprehensively tested.

          TDD does nothing inherently, this is another false claim. It does not reject any design. In practice, programmers just continuously copy the same test setup code over and over and do not care about the signal that the tests are giving them. Because TDD does not force them to. It happily admits terrible designs.

          1. 4

            It does reject designs: designs that cannot be tested cannot be generated when following strict TDD, and designs which are difficult to test are discouraged by the practice. (Edit to add: In practice, this requires developers to listen to their tests, to make sure they’re “natural” to work with. When this doesn’t happen, sure, blindly following TDD probably does more harm than good.)

            1. 1

              Sure, there’s some design out there that won’t even support an end to end test. The “naturalness” of a test though can’t be measured, so the path of least resistance is to just make your tests closer and closer to end to end tests. TDD itself will not prevent that natural slippage, it’s only prevented by developer intervention.

              That’s the TDD maximalist point of view is a nutshell - if you don’t get the alleged benefits of it, you did it wrong.

              1. 3

                For what it’s worth, I’m not a TDD maximalist, I try to be more pragmatic. I’m trying to have this come across in some of the language I’m using: non-absolute terms, like “encourage” or “discourage”, instead of absolutes like “allow” or “deny”. If you’re working with a code base that purports to follow TDD, and you’re letting your unit tests drift into end-to-ends by another name, then I’d argue that you aren’t listening to your tests (this, by the way, is the sense in which I think “naturalness” can be measured – informally, as a feeling about the code), and that (I’m sorry to say, but this does sometimes happen) you’re doing it wrong.

                Doing it right doesn’t necessarily mean using more TDD (though it might!), it’s just as easy to imagine that you’re working in a domain where property-based tests are a better fit (“more natural”) than the style of tests generated via TDD, or that there are contextual constraints (poorly-fitting application framework, or project management that won’t allow time to refactor) that prevent TDD from working well.

                1. 2

                  I’m mostly with you. I’m definitely what Hillel refers to in this article as a “test-first” person, meaning I use tests to drive my work, but I don’t do strict TDD. The way that people shut their brains off when talking about TDD kills me though.

              2. 1

                It does reject designs: designs that cannot be tested cannot be generated when following strict TDD, and designs which are difficult to test are discouraged by the practice.

                Well, yeah, that’s why “maximalist TDD” has such a mixed reputation…

                First, there are plenty of cases where you’re stuck with various aspects of a design. For example, if you’re writing a driver for a device connected over a shared bus, you can’t really wish the bus out of existence, so you’re stuck with timing bugs, race conditions and whatever. In these cases, as is often the case with metrics, the circular thing happens in reverse: you don’t get a substantially more testable design (you literally can’t – the design is fixed), developers just write tests for the parts that are easily tested.

                Second, there are many other things that determine whether a design is appropriate for a problem or not, besides how easy it is to implement it by incrementally passing automatic unit tests written before the code itself. Many of them outweigh this metric, too – a program’s usefulness is rarely correlated in any way to how easy it is to test it. If you stick to things that The Practice considers appropriate, you miss out on writing a lot of quality software that’s neither bad nor useless, just philosophically unfit.

                1. 5

                  For example, if you’re writing a driver for a device connected over a shared bus, you can’t really wish the bus out of existence, so you’re stuck with timing bugs, race conditions and whatever. In these cases, as is often the case with metrics, the circular thing happens in reverse: you don’t get a substantially more testable design (you literally can’t – the design is fixed), developers just write tests for the parts that are easily tested.

                  I’ve written device drivers and firmware code that involved shared buses, and part of the approach involved creating a HAL for which I could sub out a simulated implementation, allowing our team to validate a great deal of the logic using tests. I used an actor-model approach for as much of the code as possible, to make race conditions easier to characterize and test explicitly (“what happens if we get a shutdown request while a transaction is in progress?”). Some things can’t be tested perfectly – for example we had hardware bugs that we couldn’t easily mimic in the simulated environment – but the proportion of what we could test was kept high, and the result was one of the most reliable and maintainable systems I’ve ever worked on.

                  1. 2

                    I’m not trying to advocate against DFT, that’d be ridiculous. What I’m pointing out is that, unless DFT is specifically a goal at every level of the design process – in these cases, hardware and software – mandating TDD in software without any deliberation higher up the design chain tends to skew the testability metric. Instead of coming up with a testable design from the ground up, developers come up with one that satisfies other constraints, and then just write the tests that are straightforward to write.

                    Avoiding that is one of the reasons why there’s so much money and research being poured into (hardware) simulators. Basic hiccups can be reproduced with pretty generic RTL models (oftentimes you don’t even need device-specific logic at the other end), or even just in software. But the kind of scenarios that you can’t even contemplate during reviews – the ones that depend on timing constraints, or specific behavior from other devices on the bus – need a more detailed model. Without one, you wind up not writing tests for the parts that would benefit the most from testing. And it’s what you often end up with, because it’s hard to come up with reliable test models for COTS parts (some manufacturers do publish these, but most don’t). That’s not to say the tests you still get to write are useless, but it doesn’t have much of an impact over how testable the design is, nor that there aren’t cheaper and more efficient ways to attain the same kind of correctness.

                    My experience with TDD development in this field has been mixed. Design teams that emphasize testability throughout their process, in all departments, not just software, tend to benefit from it, especially as it fits right in with how some of the hardware is made. But I’ve also seen TDD-ed designs with supposedly comprehensive tests that crashed if you so much as stared at the test board menacingly.

              3. 2

                This is why the reasoning is circular. This point right here can be debated, but it’s presented as an axiom.

                The way they phrased that does make it kind of circular, but we can phrase it differently to avoid being circular:

                1. Correctly behaving software is better than incorrectly behaving software.
                2. More test coverage correlates with correctness. (Debatable)
                3. Therefore, designs that lend to better test coverage are more likely to be correct than designs that don’t.
                1. 2

                  Therefore, designs that lend to better test coverage are more likely to be correct than designs that don’t.

                  End to end testing has more coverage per test case though. So, by this logic, you would just write only end to end tests, which is in contrast to the isolated unit testing philosophy.

                  1. 2

                    So, by this logic, you would just write only end to end tests, which is in contrast to the isolated unit testing philosophy.

                    I’m not sure that actually is in opposition to TDD philosophy, though. My understanding of TDD is that it’s a pretty “top-down” approach, so I imagine you would start with a test that is akin to e2e/integration style before starting work on a feature. A “fundamentalist” would leave just about all of the implementation code “private” and not unit test those individual private functions unless they became part of the exposed public API of the software. A more pragmatic practitioner might still write unit tests for functions that are non-trivial even if they are not public, but I still think that TDD would encourage us to shy away from lots of unit tests and the “traditional” test pyramid.

                    I could be totally off base with my understanding of TDD, but that’s what I’ve come away with from reading blogs and essays from TDD advocates.

                    1. 1

                      End to end tests can have difficulty with “testability”, IMO:

                      1. When an end-to-end test fails, it can be very difficult to reason backwards from the failure to the fault. This is a byproduct of the fact that end-to-ends cover more code per test case: The coverage may be high, but the link between the code being tested and the test can be tenuous.
                      2. Similarly, it’s very difficult to create an end-to-end test that targets a specific part of the code.
                      3. Finally, one can’t write an end-to-end test until a new facility is complete enough to be integrated. I’d rather establish confidence about the code I’m writing as early as possible.

                      Edit to add: This is probably giving the wrong impression. I like E2E tests, whole system behavior is important to preserve, or to understand when it changes, and E2E’s are a good way of getting to that outcome. But more narrowly targeted tests have their own benefits, which follow from the fact that they’re narrowly targeted.

                2. 1

                  I was going to object along similar lines - but I agree with the sibling - your 1) does not make a good argument.

                  I’d say more something like: A testable design is better, because it allows for writing tests for bugs, helping document fixes and prevent regressions. Tests can also help document the business logic, and can improve the overall system that way too.

                  Just saying tests are good; let’s have more tests! - Doesn’t say why tests are good - and we do indeed get circular reasoning.

                  I’ve inherited a number of legacy systems with no tests - and in the few I’ve shoe-horned in tests with new features and bug fixes - those have always cought some regressions later. And even with the “wasted” hours wrangling tests to work at all in an old code base; I’d say tests have “paid their way”.

                  If those code bases had had tests to begin with, I think a) the overall designs would probably have been better, but perhaps more importantly b) writing tests while fixing bugs would have been easier.

                  I also think that for many systems, just having unit tests will help a great deal with adding integration tests - because some of the tooling and requirements for passing in state/environment (as function arguments, mocks, stubs etc) is similar.

                  1. 2

                    I do consider point 1 nearly axiomatic. What does “testable” mean? Trivially, it means something like “easy to test”, which suggests properties like:

                    • Easy to set up preconditions and provide inputs;
                    • Easy to verify postconditions and validate outputs;
                    • Easy to address interesting points in the module’s semantic domain.

                    Achieving these requires that the interface under test be pretty easy to understand. All else equal (whatever that might mean when talking about design), having these qualities is better than not having them. (Note, also, that I talk more about “testability”, not “tests”: I value testability more than I value tests, though TDD does strongly encourage testability.)

                    Still, I don’t believe this is circular. Point 1 is presented axiomatically, but the remaining points that attempt to characterize the meaning only depend on earlier points, without cycles.

              4. 7

                Programmers have a strong tendency to think rigidly and to apply rules rigidly and I think this does more harm than good. A natural consequence of working with rigidly-defined programming code, but we must always remember the meat in our heads is not as rigid as silicon, and therefore the interface between them must have more fluidity.

                Testing your code is good. This is axiomatic because code is meant to do things, and you can’t know it’s doing the right things without observing its behavior.

                Testing is observation of the behavior of the system under a continuum of purely abstract (unit), partially abstract (integration), or concrete (production) circumstances.

                A rigid adherence to a hierarchy of TDD maximalism is a handicap to performing these observations in the time, manner, and place where it is most convenient, least expensive, and creates the fastest feedback loop.

                I believe it is far more useful to think of testing and code design as a dialectical relationship in which both feed into one another and neither is primary. You should move swiftly between them at all times. The direction you first approach one or the other is unimportant.

                1. 6

                  Hillel touches on this a little, but I think it’s striking how different “modern strong TDD” is from the initial description in Kent Beck’s 2003 book.

                  In the book, Kent initially describes the “rhythm of Test-Driven Development” as “1. Quickly write a test. 2. Run all tests and see the new one fail. 3. Make a little change. 4. Run all tests and see them succeed. 5. Refactor to remove duplication.” which is how most folks think of and talk about and teach TDD (red-green-refactor), but in the text of the book, he actually demonstrates a much subtler and reasoned approach (along with be pretty funny for a technical book).

                  As an example, in Chapter 3, he introduces a Value Object. He notices that adding a Value Object brings with it a bunch of additional questions so he writes “We’ll put that on the to-do list, too, and get to it when it’s a problem. You aren’t thinking about the implementation of equals() yet, are you? Good. Me neither. After snapping the back of my hand with a ruler, I’m thinking about how to test equality.” I think that’s funny, at least. Levity amidst technical discussion is great and should be employed far more often.

                  He writes a failing test, writes the fake equals (object) { return true; } implementation, writes another failing test, and then writes the “actual” implementation of equals before finally running the test suite. At no point does he “refactor”, nor does he rigidly hew to the “run tests after every step” framework either.

                  And then in the next chapter, all he does is refactor the tests. “Isn’t that the refactor step from the previous cycle?” Yeah maybe, but by pulling it into a separate chapter he’s emphasizing that the goal isn’t dogmatic application of the steps, the goal is, “Work in small iterated chunks with constant feedback.”

                  In chapter 13, he disparages a test because it is “deeply concerned with the implementation of our operation, rather than its externally visible behavior.” That is certainly not the way people talk about unit testing or TDD nowadays.

                  The whole book is like this. I feel a little ridiculous and like a shill writing all of this out, lol, but ever since I read the book it feels like I’m watching people argue over reviews of movies they haven’t seen. At this point, my only contribution to these discussions is, “Go read the damn book!”

                  1. 5

                    My recent experience with TDD is for a Phoenix LiveView project I’m working on. I had hopes for doing full TDD since LiveView has such good support for tests. But, I couldn’t reason about the UI and the inputs from the blank screen of the test file and the documentation. I’m sure I could figure it out, but I couldn’t get a feel for what I wanted the application to do from the tests. Now I’m getting the interface working and then writing tests for all cases so I know they don’t break in the future. So it’s more Development Driven Tests. It feels like I’m not “doing it right”, but I also just want to work on the application and not worry about getting the process right. But I do think TDD is a good thing.

                    1. 10

                      This is my experience as well - once you get to trying to build a “real” application with a user interface, TDD breaks down. In fact, this is pretty much everyone’s experience. TDD works well for testing simple interfaces of single-process code. The whole TDD By Example book is about building a simple Money type!

                      That’s why I’ve been playing around with model-based testing web applications. A lot better test coverage for way less effort so far.

                      1. 2

                        Do you have any references for those looking to get into model-based testing?

                        1. 1

                          I happen to have written a post about it recently. Amazon published a paper relatively recently about it as well, though I can’t tell if what they’re testing is communicating over HTTP under the hood. The example application I built is a legitimate “regular” web application.

                        2. 2

                          I have used the technique here in stateful desktop apps with success: https://gist.github.com/andymatuschak/d5f0a8730ad601bcccae97e8398e25b2

                          Once you disentangle effects from computations, the tests write themselves.

                          1. 1

                            This is often referred to as the “humble object pattern.” Instead of testing a UI, you move all of the logic out of the UI, test that, and just don’t test the actual UI framework. This is also how I test UIs.

                        3. 5

                          I would not worry about “doing it wrong”. I think UI tests are kind of special - in the web world you send some text via tcp, and a super complex system renders pixels on screen and handles input.

                          So what does it mean to “assert submit button is visible, enabled and clickable”?

                          You’re not (generally) testing the graphics driver, the ssl-library or checking for dead pixels. But how do you avoid your tests being so trivial that they have no value? You can use a web driver like selenium and headless chrome/Firefox, as a sensible middle ground.

                          But if you were in smalltalk, you could probably scan the frame buffer, or better hook into the gui toolkit to see if there are calls to render a submit-button.

                          In other words, some of the issue is that the server-network-client-driver-render-display super-system isn’t terribly easy to test.

                          I generally don’t like tests of the form “asert there’s a css selector ‘button’ with text ‘submit’” - because it tests a “symptom” of the framework doing what it should when I called “” (or “make_button..”) etc.

                          I just care that certain state was set and a view was rendered. The framework/library is responsible for testing “”.

                          That said, in eg Rails, I think “navigate to x, assert redirect to y, assert notice contain ‘ok’” are fair enough. Maybe both too detailed and prone to repetition (my fixture says “ok”, make sure my method render output containing “ok” - tests break when I trivially change my fixture, or template..).

                          But they can document and illustrate intent pretty well - which is good.

                        4. 4

                          You could say “I have complicated feelings about <dogmatism>”. The problem is when people take some useful concepts and treat them like a religion. I kinda understand it though - if you’re used to terrible untested code that breaks all the time (i.e., most code), discovering a principled way to make it work more reliably is like a godsend. But still, use some common sense.

                          1. 4
                            1. 7

                              He expresses some pretty strong opinions and I feel like TDD is really not suited for his environment. Games typically use C++, which is slow to compile. Games are highly experimental and often have lots of code added or removed very quickly for arbitrary deadlines set by publishers. They aren’t typically having their code used by others (like a library author might). It isn’t necessarily a long lived code base. Stability is somewhat minimized. Eventually the codebase is done and people don’t need to work on it.

                              TDD seems to have come up with business apps where the working loop is faster (i.e., java or ruby write/run cycles), the code complexity is lower, and the code needs to live for a long time. The people that wrote the code are going to leave and there are going to be new people managing the codebase. In that context you need safety nets to keep people from killing themselves. The complexity comes from long project timelines and the need to change the code base over time.

                              Overall, I don’t like the tone of this video. It feels needlessly dismissive and reductionistic. I’m not into TDD, but this isn’t the most helpful take.