TIL about the original meaning of the “unit” in Unit Test. Thanks for sharing this!
I like that the article hints that the problem people have with tests starts with the wrong idea of what these tests are for. Maybe whenever we change code, we need to ask ourselves what observable behavior changes on each of the levels and create/extend appropriate test suits.
“Tests checking product behavior from user perspective”. This is what normally falls into the end-to-end category of tests, including rendered UIs and simulated mouse clicking. If the end-users are the priority of the product, then there should be the most tests of this kind. Each new product feature should ideally come with these tests that validate that user actually experiences what the design had in mind. In reality, it’s just too expensive to write these tests (developer time) and run them (CI time).
“Tests checking subsystem behavior”. This usually falls into “integration tests” bucket. If all the possible use-cases were covered with the e2e tests, we would have needed this type of tests at all. However, in the real world, integration tests provide higher level system guarantees, and test behaviors that are expected by one subsystem from another. The subsystem usually corresponds to different teams working on them. So, effectively these tests safe time to people in those teams, if something goes wrong.
“Tests checking component behavior”. Here, “component” is just a generic word I decided to use, but effectively these are Unit Tests. These tests should check behaviors on a much smaller level than integration tests. Their goal is to actually notify you or your team that code changes break behavior expectations for other components in the subsystem. This is how you should approach writing them as well. They are actually saving you time, or at least allowing you to catch a bug much earlier in the development process.
In none of these categories do we actually want to test implementation details. And I love the mental model noted in the article: good tests are actually rarely affected by refactorings.
I love this alternative definition of “unit” - that the tests themselves should be able to run independently of each other.
This solves a problem I’ve been having where I try to avoid the term “unit test” entirely, because I so dislike the pattern of testing individual classes and mocking everything else, as opposed to more integration-style testing. I’ve been talking about “automated tests” instead, but that feels clumsy to me.
I’d rather talk about unit tests with this better definition of what that means.
Fowler uses the terms “compile tests” and “commit tests”, grouping tests by if they are fast enough to run on save or need to wait until commit time.
IMO the “unit test means the test not the SUT” adds even more confusion, because almost nobody uses that definition. History notwithstanding, we all learned the term “unit test” from Kent Beck or someone that learned it from him, and at least as I’ve understood him he’s clear unit refers to the SUT.
This article definitely has some good advice on how to do testing, especially when it talks about working outside in.
That said, I think that a lot of the time when folks talk about TDD, it’s a bit like folks talking about how they couldn’t find their keys, and when you probe further, they ask why they’d look somewhere there wasn’t already a light. For me, the main value is in terms of a feedback loop, not so much tests as an artefact in themselves (although those can be handy, too).
And I think this article kinda falls into that trap, especially tip #4. To my understanding, it’s fairly common to build a walking skeleton that does virtually nothing, within your chosen architectural structure, with integration tests that demonstrate that it works end to end. And then as you add more slices of functionality, the trick is to notice the friction, and then refactor/reorganise to reduce that friction.
That said, I think this only really works well if you’re organising your work around vertical slices of functionality – ie: something that solves a problem (or at least, provide some observable fragment of an outcome). When planning, I’ll often see folks divding up work by layer, so you’ll have “create/read/update a widget in the backend/api/web UI layer”, which can carry a lot of implied baggage around how you architect a solution, and can result in a lot of wasted time and effort.
Articles like these get confusing because “TDD” means a lot of things to a lot of people, and so you end up debating what “TDD” means rather than whatever testing strategy is being proposed. (see also: Single Responsibility Principle)
Setting aside that, as well as the unreferenced definitions of “unit test”, the advice in here isn’t bad:
Test outside-in - agreed. The purpose of testing is to reduce the risk of something not working, so you should test what you think should be working from the user’s perspective
Don’t over-isolate your code (I believe “over isolate” is what the author means) - generally, I agree, but I don’t think Docker is the unit of isolation. Mocking a third party API client is fine. I think the point being made here is a good one, in that you cannot confidently refactor across the boundary of tests.
e.g. you can refactor something such that your browser/e2e tests all pass, but such a refactor would wreak havoc for your unit tests. Thus, each test creates a boundary over which refactoring will become difficult. The fewer isolated unit tests you have, the fewer the boundaries. But, testing is about managing risk, so the completely elimination of isolated unit tests is likely not the way to go.
Don’t make changes without a failing test. Generally agree here, too, for the reasons stated. You need to know that your test is testing the right thing (see a post I wrote on this for more)
Tip #4 isn’t a tip and is confusing - I think the author is conflating “design of the software” with “infrastructure in AWS”. I would agree driving the design of your software entirely by tests is not realistic and could result in severe over-testing if you need to drive decisions like caching or performance entirely by writing tests. But it’s a nice idea, I guess.
Yes, TDD just reads as a typo for TTD to me. OpenTTD has consumed a lot of my time over the years and the outcome has always been happiness. I cannot say the same about TDD.
To me TDD is as much about “can this interface be tested” as it is about testing the interface.
“Just write functional code” is one answer, but how do you verify that? Rigorous inspection is one way, and testing is another.
Even in the case of “rigorous inspection” on a PR I find it much easier to review tests that say what the expected inputs and outputs are than the code it’s testing. It gives me an entry point on what the author intended the code to do. There is a lot of code that looks correct but is subtly wrong. When reviewing, if you’re not pulling down and executing things you’re not really seeing the code for what it does, which is still testing, but it’s manual testing. Having automated tests makes manual testing easier.
testing edge cases are handled and ensuring no regressions is a hugely valuable side effect.
Actually I think the absolute biggest misunderstanding is that TDD was never intended for entire applications. Read the original book. It tests simple code that models currency calculations. It’s extremely amenable to TDD.
Do you what isn’t amenable to TDD in that same way? Web applications. There are no mocks, stubs, or any other test double in the original book. There’s no need. Because an application is not being tested. Trying to apply the same ideas to an application has always been completely awkward.
I’m not sure I’d make that inference, it seems more likely that the money examples in “Test-Driven Development by Example” are there for diadactic purposes, and Kent Beck includes a whole host of patterns (including Mock objects) in the book, too.
Kent also handles the topic of ““Can you drive development with application-level tests?”, which mostly talks about the technical problem of writing entire application tests up-front (I’d do them more incrementally) and getting your customers to write tests (which can suffer from long feedback loops).
Do you have something more specific in mind when you say it wasn’t intended for whole applications?
TIL about the original meaning of the “unit” in Unit Test. Thanks for sharing this!
I like that the article hints that the problem people have with tests starts with the wrong idea of what these tests are for. Maybe whenever we change code, we need to ask ourselves what observable behavior changes on each of the levels and create/extend appropriate test suits.
“Tests checking product behavior from user perspective”. This is what normally falls into the end-to-end category of tests, including rendered UIs and simulated mouse clicking. If the end-users are the priority of the product, then there should be the most tests of this kind. Each new product feature should ideally come with these tests that validate that user actually experiences what the design had in mind. In reality, it’s just too expensive to write these tests (developer time) and run them (CI time).
“Tests checking subsystem behavior”. This usually falls into “integration tests” bucket. If all the possible use-cases were covered with the e2e tests, we would have needed this type of tests at all. However, in the real world, integration tests provide higher level system guarantees, and test behaviors that are expected by one subsystem from another. The subsystem usually corresponds to different teams working on them. So, effectively these tests safe time to people in those teams, if something goes wrong.
“Tests checking component behavior”. Here, “component” is just a generic word I decided to use, but effectively these are Unit Tests. These tests should check behaviors on a much smaller level than integration tests. Their goal is to actually notify you or your team that code changes break behavior expectations for other components in the subsystem. This is how you should approach writing them as well. They are actually saving you time, or at least allowing you to catch a bug much earlier in the development process.
In none of these categories do we actually want to test implementation details. And I love the mental model noted in the article: good tests are actually rarely affected by refactorings.
I love this alternative definition of “unit” - that the tests themselves should be able to run independently of each other.
This solves a problem I’ve been having where I try to avoid the term “unit test” entirely, because I so dislike the pattern of testing individual classes and mocking everything else, as opposed to more integration-style testing. I’ve been talking about “automated tests” instead, but that feels clumsy to me.
I’d rather talk about unit tests with this better definition of what that means.
Fowler uses the terms “compile tests” and “commit tests”, grouping tests by if they are fast enough to run on save or need to wait until commit time.
IMO the “unit test means the test not the SUT” adds even more confusion, because almost nobody uses that definition. History notwithstanding, we all learned the term “unit test” from Kent Beck or someone that learned it from him, and at least as I’ve understood him he’s clear unit refers to the SUT.
Yeah, you’re right: the term “unit test” is likely unrecoverable at this point.
This article definitely has some good advice on how to do testing, especially when it talks about working outside in.
That said, I think that a lot of the time when folks talk about TDD, it’s a bit like folks talking about how they couldn’t find their keys, and when you probe further, they ask why they’d look somewhere there wasn’t already a light. For me, the main value is in terms of a feedback loop, not so much tests as an artefact in themselves (although those can be handy, too).
And I think this article kinda falls into that trap, especially tip #4. To my understanding, it’s fairly common to build a walking skeleton that does virtually nothing, within your chosen architectural structure, with integration tests that demonstrate that it works end to end. And then as you add more slices of functionality, the trick is to notice the friction, and then refactor/reorganise to reduce that friction.
That said, I think this only really works well if you’re organising your work around vertical slices of functionality – ie: something that solves a problem (or at least, provide some observable fragment of an outcome). When planning, I’ll often see folks divding up work by layer, so you’ll have “create/read/update a widget in the backend/api/web UI layer”, which can carry a lot of implied baggage around how you architect a solution, and can result in a lot of wasted time and effort.
Articles like these get confusing because “TDD” means a lot of things to a lot of people, and so you end up debating what “TDD” means rather than whatever testing strategy is being proposed. (see also: Single Responsibility Principle)
Setting aside that, as well as the unreferenced definitions of “unit test”, the advice in here isn’t bad:
Test outside-in - agreed. The purpose of testing is to reduce the risk of something not working, so you should test what you think should be working from the user’s perspective
Don’t over-isolate your code (I believe “over isolate” is what the author means) - generally, I agree, but I don’t think Docker is the unit of isolation. Mocking a third party API client is fine. I think the point being made here is a good one, in that you cannot confidently refactor across the boundary of tests.
e.g. you can refactor something such that your browser/e2e tests all pass, but such a refactor would wreak havoc for your unit tests. Thus, each test creates a boundary over which refactoring will become difficult. The fewer isolated unit tests you have, the fewer the boundaries. But, testing is about managing risk, so the completely elimination of isolated unit tests is likely not the way to go.
Don’t make changes without a failing test. Generally agree here, too, for the reasons stated. You need to know that your test is testing the right thing (see a post I wrote on this for more)
Tip #4 isn’t a tip and is confusing - I think the author is conflating “design of the software” with “infrastructure in AWS”. I would agree driving the design of your software entirely by tests is not realistic and could result in severe over-testing if you need to drive decisions like caching or performance entirely by writing tests. But it’s a nice idea, I guess.
For me, it will always mean Transport Tycoon Deluxe (and practically mean the open source version).
That is a powerful association given TDD and TTD are different acronyms. Is that a ringing endorsement for TTD?
Yes, TDD just reads as a typo for TTD to me. OpenTTD has consumed a lot of my time over the years and the outcome has always been happiness. I cannot say the same about TDD.
I wonder if TDD is implicitly a symptom of a need to be detect surprise state side effects when not using pure functional code.
To me TDD is as much about “can this interface be tested” as it is about testing the interface.
“Just write functional code” is one answer, but how do you verify that? Rigorous inspection is one way, and testing is another.
Even in the case of “rigorous inspection” on a PR I find it much easier to review tests that say what the expected inputs and outputs are than the code it’s testing. It gives me an entry point on what the author intended the code to do. There is a lot of code that looks correct but is subtly wrong. When reviewing, if you’re not pulling down and executing things you’re not really seeing the code for what it does, which is still testing, but it’s manual testing. Having automated tests makes manual testing easier.
testing edge cases are handled and ensuring no regressions is a hugely valuable side effect.
Actually I think the absolute biggest misunderstanding is that TDD was never intended for entire applications. Read the original book. It tests simple code that models currency calculations. It’s extremely amenable to TDD.
Do you what isn’t amenable to TDD in that same way? Web applications. There are no mocks, stubs, or any other test double in the original book. There’s no need. Because an application is not being tested. Trying to apply the same ideas to an application has always been completely awkward.
I’m not sure I’d make that inference, it seems more likely that the money examples in “Test-Driven Development by Example” are there for diadactic purposes, and Kent Beck includes a whole host of patterns (including Mock objects) in the book, too.
Kent also handles the topic of ““Can you drive development with application-level tests?”, which mostly talks about the technical problem of writing entire application tests up-front (I’d do them more incrementally) and getting your customers to write tests (which can suffer from long feedback loops).
Do you have something more specific in mind when you say it wasn’t intended for whole applications?