“Run fewer tests!”, as the article mentions, is a great idea, but the existing ways of doing this are suboptimal.
Failing fast usually stops at the first failed test, and provides a quick binary pass/fail signal (especially if tests are ordered based on historical failure rates). The downside is losing information if there would have been multiple failures. If you have multiple cycles of failure->fix bug->push PR->different failure->fix new bug->push PR, that can wipe out any time savings.
Running specific suites that are subsets of all tests cuts down on the number of tests run, at the cost of both under- and over-testing, as well as setting up and maintaining the suites. E.g., under-testing is when a relevant test wasn’t in the suite you ran, over-testing is when you run a test that cannot be affected by your recent code changes.
Much better is to use “Test Impact Analysis” (TIA). In short, statically analyze the code (using file dependencies, program dependency graph, coverage, etc) and skip all tests that you can prove are irrelevant. The savings depends on the dependency layout of your code, and the size and nature of your PR, but it can be really high, as much as 90% for a small PR with a well-structured code base.
MS pioneered TIA a couple decades ago, but outside MS/Azure, it’s surprisingly not that common, despite some major advantages. I know of the occasional test plugin doing it, and one dead startup (YourBase), but that’s it.
If anyone’s interested in such a thing for Python, I’m launching a tool soon for it: https://getspdr.dev.
Much better is to use “Test Impact Analysis” (TIA). In short, statically analyze the code (using file dependencies, program dependency graph, coverage, etc) and skip all tests that you can prove are irrelevant. The savings depends on the dependency layout of your code, and the size and nature of your PR, but it can be really high, as much as 90% for a small PR with a well-structured code base.
In Javascript-land, I believe Jest does this automatically for local test runs, and I believe other tools like Vitest can be run in this way. They look at the list of changed files from git, and then only run tests that import (possibly transitively) one of those files. This works even better in watch mode for these tools — there, it’ll see look at which file got changed, and only run tests that import that specific file.
I’ve mostly seen this done locally, with the assumption that CI jobs for a PR can run more slowly and cover the whole test suite, to catch cases where the static analyser made a mistake. But I suspect it would also work in CI if you set it up right.
I wonder if this is more common in Javascript because of how easy it is to trace dependencies statically through the code, particularly with the new static import syntax.
Oh, nice. It’s good to see it be the default somewhere.
Though I had to dig around in the code and blog posts before I found Jest talking about it. It’s not very prominent. It looks like Jest defaults to watching for changes locally, but iiuc, it should be able to work for CI too, if given the right VCS options. I guess people care about that less?
Vitest looks similar. They default to running only relevant files locally, but all tests in CI, which is a shame.
I’m surprised people complain about slow CI, but then don’t do anything to fix it.
In my experience, slow CI typically comes down to either a complicated build, or too many slower end-to-end tests that can’t easily be parallelised. If your tool can solve the latter, that would be very cool and impressive, but I suspect it’s difficult because the whole point of an end-to-end test is that you test the system as a whole. Therefore, it’s likely that a change in any file could have an impact on the final result, even if in practice it won’t.
At least in my experience, the sorts of tests that one runs with Jest/Vitest tend to be fast enough that they don’t cause much problems in CI. For example, one project I work on has around 1700 tests, and takes about two minutes in total to install dependencies, run linting/formatting/typechecking, and then run the tests. I could speed this up (the tests are currently artificially slow because there’s some property testing going on and we want to give that a bit of extra time to find edge cases), but to save 1-2 minutes it’s not really worth it, especially when compared to the risk of misconfiguring something and ending up with important tests not running properly.
On the other hand, it’s much more important to cut out unnecessary tests when running the tests locally, because the quicker you can go from “code changed” to “tests green, carry on” (or “tests red, something went wrong”), the quicker your development cycle can be. So I think it makes sense for these sorts of tools to default to running everything in CI, but concentrate harder on only running relevant tests locally.
I decided to write my tool because I worked on a large government project that had CI runs taking hours. In some cases, it took so long it could only be done overnight. I don’t expect anyone to care enough to bother if their tests finish in under 10 minutes.
Auto-parallelization is unfortunately tricky because of inter-test dependencies, either known or hidden. Not to mention any effects from adding contention to a shared test resource (remote API, database). I’ve seen CI accelerator services like Knapsack and BuildKite that help you optimize builds or improve your manual parallelization, but nothing that automatically parallelizes. (Tho it looks like Vitest auto-parallelizes by running each test file in its own process, which seems risky to me.)
The business question is, how do I find those legacy code bases with tech debt and super-long test runs, that could benefit most?
“Run fewer tests!”, as the article mentions, is a great idea, but the existing ways of doing this are suboptimal.
Failing fast usually stops at the first failed test, and provides a quick binary pass/fail signal (especially if tests are ordered based on historical failure rates). The downside is losing information if there would have been multiple failures. If you have multiple cycles of failure->fix bug->push PR->different failure->fix new bug->push PR, that can wipe out any time savings.
Running specific suites that are subsets of all tests cuts down on the number of tests run, at the cost of both under- and over-testing, as well as setting up and maintaining the suites. E.g., under-testing is when a relevant test wasn’t in the suite you ran, over-testing is when you run a test that cannot be affected by your recent code changes.
Much better is to use “Test Impact Analysis” (TIA). In short, statically analyze the code (using file dependencies, program dependency graph, coverage, etc) and skip all tests that you can prove are irrelevant. The savings depends on the dependency layout of your code, and the size and nature of your PR, but it can be really high, as much as 90% for a small PR with a well-structured code base.
MS pioneered TIA a couple decades ago, but outside MS/Azure, it’s surprisingly not that common, despite some major advantages. I know of the occasional test plugin doing it, and one dead startup (YourBase), but that’s it.
If anyone’s interested in such a thing for Python, I’m launching a tool soon for it: https://getspdr.dev.
In Javascript-land, I believe Jest does this automatically for local test runs, and I believe other tools like Vitest can be run in this way. They look at the list of changed files from git, and then only run tests that import (possibly transitively) one of those files. This works even better in watch mode for these tools — there, it’ll see look at which file got changed, and only run tests that import that specific file.
I’ve mostly seen this done locally, with the assumption that CI jobs for a PR can run more slowly and cover the whole test suite, to catch cases where the static analyser made a mistake. But I suspect it would also work in CI if you set it up right.
I wonder if this is more common in Javascript because of how easy it is to trace dependencies statically through the code, particularly with the new static import syntax.
Oh, nice. It’s good to see it be the default somewhere.
Though I had to dig around in the code and blog posts before I found Jest talking about it. It’s not very prominent. It looks like Jest defaults to watching for changes locally, but iiuc, it should be able to work for CI too, if given the right VCS options. I guess people care about that less?
Vitest looks similar. They default to running only relevant files locally, but all tests in CI, which is a shame.
I’m surprised people complain about slow CI, but then don’t do anything to fix it.
In my experience, slow CI typically comes down to either a complicated build, or too many slower end-to-end tests that can’t easily be parallelised. If your tool can solve the latter, that would be very cool and impressive, but I suspect it’s difficult because the whole point of an end-to-end test is that you test the system as a whole. Therefore, it’s likely that a change in any file could have an impact on the final result, even if in practice it won’t.
At least in my experience, the sorts of tests that one runs with Jest/Vitest tend to be fast enough that they don’t cause much problems in CI. For example, one project I work on has around 1700 tests, and takes about two minutes in total to install dependencies, run linting/formatting/typechecking, and then run the tests. I could speed this up (the tests are currently artificially slow because there’s some property testing going on and we want to give that a bit of extra time to find edge cases), but to save 1-2 minutes it’s not really worth it, especially when compared to the risk of misconfiguring something and ending up with important tests not running properly.
On the other hand, it’s much more important to cut out unnecessary tests when running the tests locally, because the quicker you can go from “code changed” to “tests green, carry on” (or “tests red, something went wrong”), the quicker your development cycle can be. So I think it makes sense for these sorts of tools to default to running everything in CI, but concentrate harder on only running relevant tests locally.
Yeah, that all makes sense.
I decided to write my tool because I worked on a large government project that had CI runs taking hours. In some cases, it took so long it could only be done overnight. I don’t expect anyone to care enough to bother if their tests finish in under 10 minutes.
Auto-parallelization is unfortunately tricky because of inter-test dependencies, either known or hidden. Not to mention any effects from adding contention to a shared test resource (remote API, database). I’ve seen CI accelerator services like Knapsack and BuildKite that help you optimize builds or improve your manual parallelization, but nothing that automatically parallelizes. (Tho it looks like Vitest auto-parallelizes by running each test file in its own process, which seems risky to me.)
The business question is, how do I find those legacy code bases with tech debt and super-long test runs, that could benefit most?
I like the idea of the Beck time as a psychological watershed for slow test suites. I’ve certainly observed it in action.
The 10min limit is definitely real. I never thought about “Run slow tests nightly”. I’ll keep that in mind (but hopefully I’ll never need this).