1. 24
  1.  

  2. 28

    I guess the author hasn’t worked on large code bases. Tests avoid regressions without needing to analyse the impact on the whole codebase.

    One of the points of tdd is in fact that it encourages small interfaces. Even in cases where the interface is not stable, having a suite of tests that captures the expected scenarios and behaviors allows developers to make changes (including during initial development) and know the impact on existing functionality.

    There are cases where manual testing makes more sense, but I’ve found those to be the exception rather than the rule. Generally scripts of no more than 1000 lines with a single well defined purpose. Of course those can then be integration tested, and also manually tested as a single unit.

    1. 14

      The post isn’t “don’t test,” it’s “mostly avoid unit tests.”

      I’m kinda inclined to agree with the author here, though I think it really depends on what sort of software you’re working on. There are projects that inevitably have lots and lots of easily unit testables interfaces. And there are projects that are inherently very stateful, making it difficult to unit test without lots of mocking. You can still do integration (or black box, or whatever) testing.

      I’m kinda in this boat at work, with a highly stateful system. There are a few things you could break out into a rather easily testable API that takes an input and outputs a result that you can check, but these are generally trivial, and if these parts don’t work, then the integration tests would reveal it anyway. So why maintain redundant unit tests? That’s not where the hard parts are.

      You could try highly stateful, highly mocked unit tests for the stateful parts but (without experience) I’ll say that it’s probably going to be high maintenance effort for poor yield. Keep changing the mocks as internals change. I’m concerned that they still wouldn’t catch the hard bugs.

      The hard bugs relate to transient behavior of the system. Threads being an obvious case. Flow control. Changes in system state that affect other components. IME unit tests are really bad at catching these type of bugs. And I’ve watched people struggle to write test cases for that type of bugs.

      I wish the system I was working on were as easy to test as sqlite, but no..

      1. 6

        The biggest advantage of unit tests for stateful systems in my experience is that it will point directly to the place in the code that is busted for the most part. integration tests tend to cover a lot of ground, but it can be hard to pinpoint what went wrong.

        My policy on this is always evolving, but usually I will use integration tests as my main line of defense for a brand new feature. This will cover the most ground but let me get something out there. Then, once I find issues outside of the happy path, I tend to target them with more unit-y tests.

        In practice, often this means that a bug in production turns into “factor out this problematic code to make it more testable”, then writing a test against it.

        This leaves the integration tests out of most edge case testing, but means that when other people hit corner cases they have a well documented example to work off of.

        If I write a specific test on an edge case, it’s more likely to be seen as intentional than if it’s just a part of an integration test.

        1. 4

          As an SRE and distributed database engineer, I despise traditional unit tests for large stateful systems. I touched on this a lot in this talk on testing complex systems. Your biases while writing the implementation are the same as your biases when writing the tests, so don’t write tests at all. Use tools that generate tests for you, because that pushes the bias space into the realm of broad classes, which our puny minds can stumble into much more easily than enumerating interleaving spaces of multiple API calls or realistic hardware phenomena over time. You can apply this technique to anything, not just simple pure functions. This paper by John Hughes goes into plenty of specifics on how to do this, too.

          We can build our systems so that these randomized tests can shit out deterministic regression tests that facilitate rapid debugging without pinning ourselves to a single point in statespace as unit / example / whatever tests do.

          Unit tests and integration tests that explore a single script just create code that is immune to the test, but not necessarily reliable.

          1. 3

            let me start off by saying I like things like generative testing and am always looking for ways to integrate that kind of tooling into projects.

            I have found that for enterprise software, where you have a pretty heterogenous system with a lot of edge cases around existing data in the system, it’s hard (in a holistic sense, not in a tooling sense) to really make cross-cutting declarations about the behaviour of the system. X will be true, but only when settings Y and Z are toggled in this way, and only during this time of day. Often times probing in the database for certain sorts of global conditions that affect things in a cross-cutting manner.

            You can decouple systems to make this flow better, but often there’s intrinsic difficulties, where your best bet is to isolate X truth-ness. But when you have calculateXTruthiness: Bool -> Bool -> Bool -> Bool, the value of generative testing goes down a decent amount because it’s just a predicate! Meanwhile you do get at least a bit of value from some unit tests at least to document known correct behaviours (from a business rules perspective).

            It’s all a spectrum, but it can be slim pickings in enterprise software for generative testing. Your best bet is to refactor to pull out “systemic” parts of your code to make it easier to test, even if your top layer remains messy as a consequence of reality being tricky.

            Lots of time there is simply not really many overall properties to eke out of your system beyond “whatever the system is doing already” (because backwards compatibility is so important nowadays in the kinds of systems we build).

            1. 2

              Whatever your expectations of a thing are, you will almost always have success in violating them through a sequence of generated interactions if you built the thing with scripted unit and integration tests. If you have no expectations, then your job is done and you can look busy in other ways :P

      2. 10

        Integration tests tend to be more useful for broad regression detection. Often, the failures in a system come from mistaken assumptions about the behavior of other modules interfaces, and not from within the module itself. If I had a choice, I would prefer a handful of end to end tests over the same amount of time invested in unit tests. Or even better, a mix of integration tests to cover end to end issues, with unit tests on subtle or hairy core algorithms.

        It’s not a choice between unit testing and manual testing – there are other types of automated test.

      3. 12

        I agree with the sentiment of not writing unit tests for badly unit testable things. Requiring mocks to test things is one smell of these. We need to be careful when passing this instruction to juniors, or we will be overseeing codebases with no unit tests in no time.

        Requiring everything to be automatically testable somehow still seems like a good guideline, though.

        1. 6

          Requiring everything to be automatically testable somehow still seems like a good guideline

          Yeah, what’s wrong with the principle of preferring to push state management to the edges of the system, and providing unit tests for the resulting functional core? The fact that such tests can’t catch interaction bugs doesn’t make them useless.

        2. 9

          Writing tests can prevent problems before they happen.

          1. 4

            I enjoyed reading this - though I read it because your one sentence summary gave me the “but waaait cleanroom” reaction.

            Another summary is “reflecting on your code, mechanically or not, improves quality”.

          2. 9

            My personal preference is to model as much of the requirements as feasible with types and write tests for the rest. In a language like Haskell, with a very powerful type system, this leaves little room for unit tests and somewhat more room for integration tests. You still need your traditional end to end system tests as well, of course, which fall outside the reach of your type system.

            1. 4

              You would like Idris, which takes this to an extreme. You can encode state machines, concurrent protocols, and much more in types, which looks like a whole new type of “metaprogramming”, and the choices it gives you are amazing.

              1. 6

                Idris is a great language, but it’s clearly not production ready. I can’t say I used any dependently typed language seriously, and I’m sure my opinion would change a lot if I did, but currently, I favor the “ghosts of departed proofs” kind of type level modeling, where you don’t prove your implementation internally, but you expose proof witnesses in the interface, so the users of your library can enjoy a very strongly typed interface.

                This aligns very well with how I perceive types should be used; i.e. organize code such that entangled pieces of code relevant to a propertiy that is hard to prove live next to each other, and you can informally (i.e without relying on types) prove to yourself that they satisfy the property. Then expose those pieces with an interface that doesn’t allow (relying on types) the property to be violated by consumers.

                1. 3

                  you don’t prove your implementation internally, but you expose proof witnesses in the interface

                  Can you point to some examples, please? I don’t really follow.

                  1. 3

                    Take a look at the justified-containers library. When you check whether a key is in a map, if the key actually is there, it gives you a type-level witness of that fact. Then when you lookup that key with that witness, you receive the value without a Maybe wrapping, because it’s proven already. However, the library uses fromJust internally (i.e doesn’t prove that fact to the compiler), because you can prove outside the type system that it’s impossible to receive a Nothing.

                    1. 1

                      Thanks

                  2. 1

                    but it’s clearly not production ready

                    This sort of requires a qualifier. It’s probably not “introduce this to a company”-level production ready, but it certainly feels like it’s “start an open-source project”-level production-ready, which seems to be at least relevant in online discussions. It’s such a great language because it brings enormously powerful concepts from other languages like Agda and Coq, into an environment that basically looks like Haskell. I think any advanced Haskell programmer will be pleasantly surprised how these higher-level features that feel clunky and require extensions become trivially easy in Idris (although that’s just an intuition, I’ve never really dabbled in Haskell beyond trivial stuff and started using Idris directly).

                    I can’t say I used any dependently typed language seriously, and I’m sure my opinion would change a lot if I did, but currently, I favor the “ghosts of departed proofs” kind of type level modeling

                    It’s not just about writing explicit proofs in code. I mean in more advanced code it will pop up but being able to use expressions in types and types in expressions is extremely flexible. Look at this example of a concurrent interface for operations on list:

                    ListType : ListAction -> Type
                    ListType (Length xs) = Nat
                    ListType (Append {elem} xs ys) = List elem
                    

                    How many languages allow you to express things on this level?:)

                    Or look at this merge sort definition:

                    mergeSort : Ord a => List a -> List a
                    mergeSort input with (splitRec input)
                      mergeSort [] | SplitRecNil = []
                      mergeSort [x] | SplitRecOne = [x]
                      mergeSort (lefts ++ rights) | (SplitRecPair lrec rrec) -- here
                                = merge (mergeSort lefts | lrec)
                                        (mergeSort rights | rrec)
                    

                    There you’ve used a view of the data-structure that is independent of its representation, specifically you viewed the list as a concatenation of two lists of equal length. A whole other axis to split your implementation over when it makes sense.

              2. 7

                This line of reasoning forms a reasonable argument against TDD and ‘early testing’ in general, but

                For other types of code, your time is better spent carefully re-reading your code or having it reviewed by a peer.

                is a false dichotomy. You should reread your code and have it reviewed and write tests.

                In my experience the time writing and executing tests that verify that even small functions do what I think they do is worth it. When I am being lazy and think I can do without a test, and the reviewer doesn’t catch either the missing test or the bug, it simply happens too often that I am quickly reproved when the code hits production.

                I have started to enjoy writing tests, exactly because they provide confidence my code does what it is supposed to do and will not bother everyone else by containing a stupid bug that could have been prevented by spending an additional 30 minutes writing some tests.

                1. 8

                  is a false dichotomy

                  Kinda, but. Everything you do consumes time. Review and tests included. If time were of no concern and you wanted maximum quality, maybe you’d do formal proof, verification, review, and then if you don’t trust your proof, maybe you’d do all the tests you want. When time and money is limited, you opt for what gives most bang for the tick-tock. I’m more and more convinced that testing has become such a religion that some places exercise it without regard for cost versus benefit. Of course, it’s damn hard to measure. But I’ve seen too many bad tests (that confuse the developer when it breaks and they need to figure out WTF is wrong and what to do about it), tests that don’t actually test anything useful, tests that are redundant because the functionality is tested multiple times at multiple levels, test batteries that have grown so big people don’t actually run them because it takes forever…

                  1. 2

                    All true: it all depends on what your process is currently like and how it can improve.

                    I mainly want to warn for the implicitly stated assumption that the trade off in time has to be between QA activities such as rereading code, reviewing and testing. The time for testing can also come from implementing fewer features or hiring an additional engineer. Or from starting to write unit tests and discovering that that actually saves time because of the reduction in debugging and rework.

                    Damn hard to measure, but either you start measuring it, you find some other way to honestly appraise the value of certain activities (root cause analysis of bugs and ways to have prevented them are a way) or you don’t improve, because you are only shifting time between activities that add value, instead of adding time and evaluating the results.

                2. 6

                  I agree with the core idea that some behavior is better tested at an integration test level rather then the unit test level.

                  However, I feel the article misses one very important use case for unit tests, namely to ensure that code in languages without strong compilation-time checks works at all. E.g. it’s quite valuable to have a basic “unit” test with any Python function that just executes it with some data to ensure a lack of syntax errors and basic type errors.

                  1. 5

                    Often people focus too much on unit tests, trying to achieve a high coverage ratio, at the expense of integration tests and running code on pre-production stages.

                    1. 5

                      Write unit tests for library code and algorithms, integration tests for everything else.

                      1. 4

                        I switched my work language from a compiled language (C#) to an interpreted one (Ruby) a couple of years ago. Before that switch, I tended to agree with the author more. After working in a non-compiled language for a while, I’ve started to lean more towards always writing tests for everything. That happened after I kept making the kind of brain-dead mistakes that would be caught by the compiler in such languages - wrong/misspelled variable name, syntax errors, basic logic mistakes, etc.

                        You could find these bugs if you reviewed your code really carefully. But I find it very time-consuming and unappealing to review my own code carefully enough to find this stuff. Writing a few quick tests proves that it actually works, and keeps proving it for the lifetime of the code.

                        I thought I was smart and careful enough to not have to worry about that kind of thing, but the number of times I’ve been bitten much later than I would like tells me that I’m not. Maybe you’re better and you actually don’t make those kinds of mistakes, but do try counting how many times it’s happened to you.

                        This also points back towards sticking with compiled languages after all, but that’s a whole different discussion.

                        1. 1

                          Indeed, vigorous unit tests can serve as a crutch for languages that lack static type-checking.

                        2. 4

                          One of the great things about tests is they can show you unexpected changes you made to the program. You edit one function and then run the tests and see that some far away component has broken because it was using the thing you just changed and you didn’t even know. Editing the tests is like confirming the changes you just made are correct.

                          1. 3

                            I do not agree at all. You should always have unit tests whenever possible.

                            If a bug gets through, then you can add it to your test and prevent it from happening next time. Get a false positive? Well then you can adjust your tests and make them more robust.

                            If you have external integration points (two services you control) then you need to have integration tests. Don’t mock your services, but setup your pipeline so when their unit tests pass, they get deployed somewhere where integration tests immediately run. With a well thought out design, you can even promote your build to another environment at this step.

                            We also have so many more tools today to ensure our unit tests don’t use a bunch of mocks. There are tons of test-container frameworks for all kinds of build tools that let you spin up MySQL, Postgres and/or Redis containers. You can start a container, migrate your db, load your fixture data, fire off a bunch of tests and then get rid of that container. This is even more amazing because it sets you move up to newer database version really easily.

                            When you work on a project with a really good test suite, it’s amazing. You can make big refactors without having to worry about breaking everything and manually re-test everything. You cannot work on a large code base without test automation. It’s just going to be hell.

                            Maybe some of these things aren’t technically unit tests? Possibly. With container+fixture testing, we’re kinda pushing the limits of what we can actually do in tests compared to a few years ago. Maybe I’m just having a definition problem.

                            But in any case: test. Always. You don’t have to write the tests first if it’s a new project, but once you get your framework going, it’s just going to be easier to start with your tests and then add your implementation.

                            1. 3

                              This is just plainly bad advice.Unit tests help you more quickly identify problems. Avoiding them only harms your team.

                              1. 3

                                It’s not all that black and white.

                                IMO, there are times when unit tests help, there are times when integration tests help and there are times when it’s best not to write any tests.

                                Based on my experience working with huge codebases as well as starting codebases from scratch and working with teams as well as working on a project on my own, I have found that the optimal test/testing requirement varies a lot. However, in times of doubt, it’s probably a good idea to err on the side of writing tests. (Disclaimer: I hate writing tests.)

                                These are my rules of thumb at the moment (they are still evolving):

                                • Do not write tests if the interfaces are evolving/changing fast. There will be a lot more hesitation to make needed interface changes if you write tests very early on in that process.
                                • You can choose not to write tests if you’re the only one responsible for maintaining it and there isn’t any big consequence if it breaks.
                                • You should write tests if multiple people with different experiences are involved.
                                • You have to write tests if the cost of breaking something downstream is very high.
                                • Do not overdo unit/integration testing - recognize why a specific test is needed before writing one. In other words, understand the potential cost of not writing a test.
                                • Integration vs unit: If you have to choose one, choose the integration tests. Because ultimately, it’s the final system that matters the most.

                                It also matters how a test is written. I have seen some really good tests (i.e. readable, serves the purpose and concise) tests as well as very bad tests. Good tests can also serve as an unofficial guide to using the interface. Bad tests, on the other hand, confuse people about the purpose of the interface and unnecessarily create an obstacle to further changes.

                                1. 3

                                  I think this varies by language. IMO the sweet spot for productivity on projects past a certain size is:

                                  • Statically-typed, null-safe languages,
                                  • with integration tests covering the major flows / use cases,
                                  • and a small number of unit tests for core library code.

                                  This is pretty much the inverse of the “testing pyramid” I’ve seen discussed elsewhere — where the base of the pyramid is a large number of unit tests, and on top of that sit far fewer integration tests — but assuming the above are true, I agree with the author that in my experience additional unit tests rarely provide additional value and incur high maintenance costs.

                                  That being said, for dynamically-typed languages like Ruby (or to some extent even statically-typed languages with inexpressive, unsound type systems like Go), you really do need high unit test coverage on large codebases to ensure correctness. You can’t rely on the compiler to tell you if you did something wrong, humans inevitably make mistakes, and on big enough projects the original authors of a piece of code will sometimes (often?) miss catching those mistakes, or may not even be around anymore. I think this is one of the downsides of using those kinds of languages for large projects: the extra time you save by not writing out the types you end up paying back many times over maintaining extensive unit test suites.

                                  For small teams and codebases you can pretty much do whatever, though. If enough of the whole thing can fit in everyone’s heads, humans can be a sufficiently smart compiler and test framework.

                                  1. 3

                                    In some cases, you can think of things like type signatures in a statically typed language as a replacement for a certain class of unit test. I.e. you get the type system to verify certain properties instead of just using unit tests to bounce data off it. At some point, you reach the limits of your type system (just how much you can express in the type system will vary from language to language), so you fill the gaps with unit tests.

                                    If you look at its this way, the “testing pyramid” is still the right way up: You have a large base of very localised type signatures, with unit tests filling the gaps, and then you have a smaller number of integration tests forming the top of your pyramid.

                                    1. 1

                                      I can see why you might think of Go’s type system as inexpressive but I’m curious as to why you think it’s unsound.

                                      1. 2

                                        Since Go allows nil to be passed in the place of any interface, it’s unsound (in the sense that soundness is used in type theory): when the Go type-checker says that an expression is of some interface type, we may at runtime get a value that isn’t actually of that interface type — instead, we can get nil.

                                        Found this article from Brown’s CS department that gives a decent explanation of soundness:

                                        The central result we wish to have for a given type-system is called soundness. It says this. Suppose we are given an expression (or program) e. We type-check it and conclude that its type is t. When we run e, let us say we obtain the value v. Then v will also have type t.

                                        In practice a fair amount of Go code is even less reliable from a type theory perspective than just not-nil-safe: to compensate for not having generics it seems somewhat common practice to cast to and from the empty interface{} type and hope you didn’t mess up somewhere, much like void pointers in C.

                                    2. 3

                                      Oh my goodness agreed. I’ve felt like this for a long time but haven’t been able to express it so nicely. Bookmarking for future discussions.

                                      1. 2

                                        I agree with the author. I used to take pride in writing lots of test and achieving high coverage, but software development doesn’t occur in a vacuum.

                                        Development time is money. A business can’t charge for test coverage (please don’t introduce whataboutery here).

                                        Unit tests are a tool for maintaining stability in a given part of a system. They aren’t the only tool, and for most of the SaaS work that I primarily do, they certainly aren’t the best tool.

                                        Types vs Tests has become a trope, but the fact is you can — and should — have both.

                                        If you have decided to use a dynamic language and then try to enforce constraints by writing all those tests yourself manually, I believe you have made a poor business decision.

                                        1. 1

                                          How likely is your stack implementation to break after it’s initially written?

                                          Very. If you - for whatever reason - find yourself implementing a stack class instead of using an existing one, you’re probably going to come back and optimize or improve it to better fit the specific needs that required it in the first place.