So my personal regex engine project in Rust has a pretty comprehensive test suite (1192 tests at the moment). I actually had to cut the number of tests down from around 3500 because a test build would take something like 20 minutes to compile. Now it takes around 14 seconds (once compiled, the run takes another 10 seconds to go through all the tests).
This doesn’t sound reasonable at all. rust-analyzer has roughly the same amount of tests, and clean compile takes a bit over a minute with 4 threads.
$ rm -rf target
$ t cargo t --no-run -q -j 4
real 68.74s
cpu 262.32s (241.41s user + 20.91s sys)
rss 1015.65mb
$ t cargo t -q -- --test-threads 4 --format pretty | rg ok | wc -l
real 8.93s
cpu 32.82s (31.89s user + 930.98ms sys)
rss 140.71mb
3655
I haven’t dug too deep into the why, though I can tell you it only generates a single executable.
I think the issue comes from the fact that, for one of the tests, we build a several-thousand-line function that tests several thousand regular expressions. Basically ~2500 repetitions of:
I don’t want to split it into ~2500 different functions because the goal is to test how functional the regex engine is on simultaneously matching ~2500 expressions over the same input. :)
At the end, we fire the input through the engine and then go check the reported matches against the expected matches.
I’m sure there are better ways to do it. For now, reducing it to ~400 expressions in that test works.
What’s fun too is that, if the full 2500-expression block is compiled in, running cargo test alone causes a segfault at runtime. You have to run cargo test -- --test-threads 1, which make me think there’s excessive stack usage on the threads’ stacks (there’s plenty of potential culprits: some of the calls to vec! have ~100 members, and there’s a lot of vec! calls, etc). In my copious free time I will debug that part. :)
I think there should be a way to find the right tests to compile and run given the changes, but it’s not trivial. I know I’ve seen some projects where they used machine learning statistics + random mutation to get a good idea, another, more deterministic solution is to use the coverage information. Either way it takes time to get the information, but it could possibly be shared between colleagues.
Another thing that could help the situation is to have some kind of dependency graph between the tests, something to say “if test A failed, then test B, C, etc. will fail for sure, don’t need to run them”. But that only helps at runtime, not compile time.
I’m writing this while waiting for my 20 minute build, on a 96 cores 192GB ram monster of a build machine, excluding tests.
[15 seconds] is the sweet spot amount of time where you become tempted to “do something else” while you wait. I may have googled something random, attempted to make another change, or checked my instant messages. Inevitably, I would be distracted and it could easily be a full minute before I checked back in on my compile status.
I know exactly how the author feels. At the same time, I wonder how reasonable this is. Our work involves interacting with a device that facilitates and encourages a short attention span, but I regularly wonder how much more productive I could be if I could wait, intently, for a full minute without getting distracted.
I heard this described as “the rule of eights”, though I can’t remember who said it. (It wasn’t me!) Each “8” refers to how long the asynch thing takes after you initiate it.
If something takes 8 seconds you can stay on task and remain in flow.
If it takes 8 minutes, you’re going to switch onto a different task and lose context. Maybe you don’t even come back when the 8 minutes is done.
If it takes 8 hours, you’re going to desk-check, design up front, and plan when the task starts. Mostly likely you start it before leaving for the day and check its result the next morning. (Often discovering that it errored out a few minutes into the 8 hours and you have to start it over.)
So my personal regex engine project in Rust has a pretty comprehensive test suite (1192 tests at the moment). I actually had to cut the number of tests down from around 3500 because a test build would take something like 20 minutes to compile. Now it takes around 14 seconds (once compiled, the run takes another 10 seconds to go through all the tests).
This doesn’t sound reasonable at all. rust-analyzer has roughly the same amount of tests, and clean compile takes a bit over a minute with 4 threads.
Are you perhaps hammering linker too much via integrated tests/doc tests, each of which creates a separate binary (https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html)?
I haven’t dug too deep into the why, though I can tell you it only generates a single executable.
I think the issue comes from the fact that, for one of the tests, we build a several-thousand-line function that tests several thousand regular expressions. Basically ~2500 repetitions of:
I don’t want to split it into ~2500 different functions because the goal is to test how functional the regex engine is on simultaneously matching ~2500 expressions over the same input. :)
At the end, we fire the input through the engine and then go check the reported matches against the expected matches.
I’m sure there are better ways to do it. For now, reducing it to ~400 expressions in that test works.
This might worth reporting as a bug against compiler. It looks like something somewhere is accidentally quadratic.
What’s fun too is that, if the full 2500-expression block is compiled in, running
cargo test
alone causes a segfault at runtime. You have to runcargo test -- --test-threads 1
, which make me think there’s excessive stack usage on the threads’ stacks (there’s plenty of potential culprits: some of the calls tovec!
have ~100 members, and there’s a lot ofvec!
calls, etc). In my copious free time I will debug that part. :)I think there should be a way to find the right tests to compile and run given the changes, but it’s not trivial. I know I’ve seen some projects where they used
machine learningstatistics + random mutation to get a good idea, another, more deterministic solution is to use the coverage information. Either way it takes time to get the information, but it could possibly be shared between colleagues.Another thing that could help the situation is to have some kind of dependency graph between the tests, something to say “if test A failed, then test B, C, etc. will fail for sure, don’t need to run them”. But that only helps at runtime, not compile time.
I’m writing this while waiting for my 20 minute build, on a 96 cores 192GB ram monster of a build machine, excluding tests.Take a look at NCrunch. It’s been a while since I did any .NET, and I miss this tool.
I know exactly how the author feels. At the same time, I wonder how reasonable this is. Our work involves interacting with a device that facilitates and encourages a short attention span, but I regularly wonder how much more productive I could be if I could wait, intently, for a full minute without getting distracted.
I heard this described as “the rule of eights”, though I can’t remember who said it. (It wasn’t me!) Each “8” refers to how long the asynch thing takes after you initiate it.
If something takes 8 seconds you can stay on task and remain in flow.
If it takes 8 minutes, you’re going to switch onto a different task and lose context. Maybe you don’t even come back when the 8 minutes is done.
If it takes 8 hours, you’re going to desk-check, design up front, and plan when the task starts. Mostly likely you start it before leaving for the day and check its result the next morning. (Often discovering that it errored out a few minutes into the 8 hours and you have to start it over.)