1. 8
  1.  

  2. 6

    Note that it relies heavily on snapshot testing, so most of these lines of code are multiline string literals.

    (dons a hat made of parenthesis) data is code, so:

    • make sure that every test is just a single string literal, which consists of two parts:
      • using snapshot testing for output (you are already doing that)
      • making an “interpreter” for input. Your input already looks almost like a series of shell commands. Make it literally a series of command in a string literal, write a small driver that splits this input by lines, then each line by whitespace, then executes the line.
    • move these string literals out of .rs files into .txt files, so that each tests looks like
    #[test]
    fn test_test_config_strategy() -> eyre::Result<()> {
      check(include_str!(“config_strategy.txt”));
    }
    
    • replace a bunch of test functions with a single test function which loops over all files with a .txt extension in a dir.

    The result should be much faster: you moved computation from compile time to runtime, but it is much more efficient at runtime, as we don’t materialize any compiler internal data structure to represent strings, and can trivially make the computation parallel.

    1. 2

      You’re no doubt aware, but you lose a lot of developer productivity when you stop using the host language:

      • No more syntax highlighting or checking.
      • No more autocomplete (especially difficult for onboarding new users). Also, tools like GitHub Copilot might stop offering useful suggestions?
      • Attribution to specific lines in the test cases becomes difficult (not impossible).
      • Some of the tests require non-trivial logic, like filtering values from the output, checking that two strings are the same, making a copy of the repo and operating on that, providing various options (like environment variables or exit code). Then you have to build a bunch of ad-hoc DSL operators to support those kinds of tests. Or, if you choose to write those kinds of tests in the host language, you notice an urge to skip testing certain behaviors in the host language and just use the DSL because it’s easier, which reduces reliability of the code.

      At that point, it might be better to switch to e.g. Python for the test runner, since it has a lot of already-developed tooling — but I would prefer that Rust matched the compilation speed of Python. It would be nice if at least the no-op compilation time weren’t 340ms.

      1. 2

        Most importantly, you lose “run specific test from an IDE” functionality, as in the limit this approach is equivalent to implementing your own test runner, which IDE knows nothing about. Still, for some projects (mostly compiler-shaped things with a lot of tests) such approach makes sense in the limit.

        On thing that does not add up for me is that using a different linker doesn’t help. At least for the last benchmark, with a single crate, it should mostly be just linking. locally (on linux) I see the runtime improve from 1.5s to 0.5s when I switch to lld. Did you double-check that you actually do use mold? The config is super-fidly, I personally most of the time end up using default linker even though I thought I’ve configured the right one.

        Some further tools for profiling here:

        This is surprising. I would expect the overhead for a no-op build to be similar to git status, maybe 15ms:

        This is not entirely surprising, cargo does a bit more than git status. In particular, it needs to verify that Cargo.lock is up-to-date with Cargo.toml. Surprisingly, for me no-op build of git-branchless is a bit longer (120ms) than a no-op build of rust-analyzer (90ms). git-branchless has more deps, so this might explain things, although CARGO_PROFILE=0 does not entirely explains the difference between the two

        1. 1

          Thanks for the additional resources. I tried the fasterthanli.me approach of creating a wrapper script for mold. It does get invoked during a fresh build, which confirms that the linker is indeed being used. It doesn’t get invoked during the no-op build (cargo test -p git-branchless-test --no-run), which I guess makes sense, because you shouldn’t need to re-compile or re-link anything in this case.

          I tried a comment-only change again and got this output:

          $ echo "// @nocommit test" >>git-branchless-test/tests/test_test.rs && time cargo test -p git-branchless-test --no-run
          ...
              Finished test [unoptimized] target(s) in 1.27s
            Executable unittests src/lib.rs (target/debug/deps/git_branchless_test-04c88566abb66bee)
            Executable unittests src/main.rs (target/debug/deps/git_branchless_test-26f4d2be39b18778)
            Executable tests/test_test.rs (target/debug/deps/test_test-ea49e36b51e14448)
          cargo test --timings -p git-branchless-test --no-run  1.12s user 0.40s system 111% cpu 1.361 total
          

          Interesting, cargo is missing 91ms of timing from its output.

          The build timings say that 780ms is spent on compiling git-branchless-test. My assumption is that it includes linking implicitly.

          From the same invocation, the mold logs indicate that it uses 272ms of real time overall:

               User   System     Real  Name
              0.489    0.078    0.272  all
              0.309    0.056    0.155    read_input_files
              0.020    0.001    0.012    resolve_symbols
          ...
          

          309ms of user time is more than I would hope when it comes to just reading input files, especially considering that I’m using an SSD, but I don’t know how I would reduce that. (I guess loading things into a ramdisk might help?) But, anyways, 155ms of real time for reading input files is probably acceptable.

          So compilation takes 780ms - 272ms = 508ms.

          A subsequent no-op build uses 250ms according to cargo, with 92ms unattributed according to time, so I’ll assume that the dependency resolution time takes ~250ms:

          $ time cargo test -p git-branchless-test --no-run
          ...
              Finished test [unoptimized] target(s) in 0.25s
            Executable unittests src/lib.rs (target/debug/deps/git_branchless_test-04c88566abb66bee)
            Executable unittests src/main.rs (target/debug/deps/git_branchless_test-26f4d2be39b18778)
            Executable tests/test_test.rs (target/debug/deps/test_test-ea49e36b51e14448)
          cargo test -p git-branchless-test --no-run  0.24s user 0.09s system 97% cpu 0.342 total
          

          For the comment-only change: 1270ms (total) - 91ms (startup/shutdown according to time) - 508ms (compilation) - 272ms (linking) - ~250ms (dependency resolution) = 149ms remaining, which might be explainable by variations in the dependency resolution timing calculation.

          So overall, it does seem like compilation itself is taking the most time for some reason, and bootstrapping and profiling the compiler seems very involved.


          For completeness, I checked cargo llvm-lines again, and the top functions are monomorphic and seem reasonable in terms of code size, considering how complex they are:

            Lines                 Copies              Function name
            -----                 ------              -------------
            111342                3991                (TOTAL)
              2909 (2.6%,  2.6%)     1 (0.0%,  0.0%)  git_branchless_test::apply_fixes
              2474 (2.2%,  4.8%)     1 (0.0%,  0.1%)  git_branchless_test::print_summary
              2454 (2.2%,  7.0%)     1 (0.0%,  0.1%)  git_branchless_test::run_tests
              2202 (2.0%,  9.0%)     1 (0.0%,  0.1%)  git_branchless_test::test_commit
              1655 (1.5%, 10.5%)     1 (0.0%,  0.1%)  git_branchless_test::ResolvedTestOptions::resolve
          

          I guess that if I wanted to optimize code generation time, I would want to make those functions easier to handle for the compiler somehow? apply_fixes is 250 lines long and probably stressful for the compiler somehow (lots of iterator chaining?).

          1. 1

            and bootstrapping and profiling the compiler seems very involved.

            It’s not that involved, you don’t need to bootstrap.

            Not at the laptop, but you need something like

            $ cargo install --git https://github.com/rust-lang/measureme --branch stable crox 
            $ export RUSTFLAGS=“ Z self-profile”
            $ cargo +nightly test -p git-branchless-test --no-run
            $ crox file_name.mm_data
            $ open the resulting json in chrome’s dev tools
            
            1. 1

              Thanks, I tried it out. A no-op change doesn’t write a trace at all (which is as expected). The following is a single-comment change:

              $ rm -f /tmp/mold-log.txt && rm *.mm_profdata && echo "// @nocommit test" >>git-branchless-test/tests/test_test.rs && RUSTFLAGS='-Z self-profile -C link-arg=-fuse-ld=/Users/waleed/Workspace/git-branchless/mold-wrapper.sh' cargo +nightly test -p git-branchless-test --no-run && crox *.mm_profdata
              ...
                  Finished test [unoptimized] target(s) in 1.01s
              warning: the following packages contain code that will be rejected by a future version of Rust: lalrpop v0.19.8
              note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 2`
                Executable unittests src/lib.rs (target/debug/deps/git_branchless_test-fd4aa65de9090ce8)
                Executable unittests src/main.rs (target/debug/deps/git_branchless_test-57f06a0b51e0f287)
                Executable tests/test_test.rs (target/debug/deps/test_test-cab71978e7273aa3)
              

              The result is a trace with the following top-level entries (excluding small ones):

              • incr_comp_prepare_session_directory: 8ms
              • configure_and_expand: 72ms
              • analysis: 29ms
              • codegen_crate: 147ms
              • serialize_dep_graph: 18ms
              • link: 292ms

              According to the trace, compilation seems to have taken only ~300ms, which is overall pretty reasonable, but cargo itself reports 1.01s for some reason. From this perspective, it seems as if the slowdown is primarily in cargo rather than rustc?

              1. 2

                It might, you can try running

                $ CARGO_PROFILE=0 cargo test -p git-branchless-test --no-run
                

                to get some info from cargo’s side