-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Reduce cargo overhead for incremental recompilation #8833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! It's definitely the intention that Cargo is a speedy build system and doesn't have a lot of overhead. We've done a fair bit of performance work in the past, but unfortunately we don't have a great benchmark suite and don't track performance over time. That being said, unfortunately "make Cargo faster" probably isn't the most actionable here. It's probably best to dig into profiles and knock out slow functions one-by-one. |
What do you think about this idea? |
That sounds plausible, but I think it'd be good to dig in and see where the performance is going on a deeper level. It's not entirely obvious to me given the graphs where the time is going and whether that would help a whole lot. |
You can inspect the flamegraph live at https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/bjorn3/562a8772b08ce58d0a7746a0f7c12b72/raw/e5646219a679b8fc3082c12578814e4b583ea3dd/bevy_breakout_fresh.collapsed A non-trivial portion of the time is spent in the toml parsing code. |
I'd encourage local testing and strategies to try to measure the impact of possible optimizations, and PRs to make things faster are of course always welcome! |
bjorn3@c7f58d3 saves about 7% by avoiding unnecessarily capturing a backtrace at the start of each job. |
bjorn3@602a1bd saves another big chunk by avoiding spawning a new thread for each fresh job. |
Nice! |
Improve performance of almost fresh builds This currently saves about 15ms out of the 170ms Cargo overhead when compiling Bevy. part of #8833 <details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/602a1bd7f5a0da9878952f492508aca5eddd1c53"><code>602a1bd</code></a></summary> Completely fresh: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf build --release --example breakout" Benchmark #1: ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 613.8 ms ± 509.1 ms [User: 147.6 ms, System: 38.4 ms] Range (min … max): 170.8 ms … 1199.2 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: ../../cargo/cargo_improve_perf build --release --example breakout Time (mean ± σ): 164.2 ms ± 0.8 ms [User: 137.0 ms, System: 27.1 ms] Range (min … max): 162.8 ms … 169.6 ms 100 runs Summary '../../cargo/cargo_improve_perf build --release --example breakout' ran 3.74 ± 3.10 times faster than '../../cargo/cargo_master build --release --example breakout' ``` (statistical outliers are consistently reproducible and don't happen for any other benchmarks) ``` $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout Finished release [optimized] target(s) in 0.16s [...] Finished release [optimized] target(s) in 0.16s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 192,95 msec task-clock:u # 0,242 CPUs utilized ( +- 0,69% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4926 page-faults:u # 0,026 M/sec ( +- 0,11% ) 387516813 cycles:u # 2,008 GHz ( +- 0,69% ) 685141858 instructions:u # 1,77 insn per cycle ( +- 0,04% ) 124443483 branches:u # 644,958 M/sec ( +- 0,05% ) 2726944 branch-misses:u # 2,19% of all branches ( +- 0,10% ) 0,796 +- 0,168 seconds time elapsed ( +- 21,06% ) $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf build --release --example breakout Finished release [optimized] target(s) in 0.14s [...] Finished release [optimized] target(s) in 0.15s Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs): 168,78 msec task-clock:u # 0,997 CPUs utilized ( +- 0,56% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4075 page-faults:u # 0,024 M/sec ( +- 0,16% ) 372565970 cycles:u # 2,207 GHz ( +- 0,61% ) 667356921 instructions:u # 1,79 insn per cycle ( +- 0,03% ) 120170432 branches:u # 712,010 M/sec ( +- 0,04% ) 2642670 branch-misses:u # 2,20% of all branches ( +- 0,12% ) 0,169294 +- 0,000933 seconds time elapsed ( +- 0,55% ) ``` Need to recompile single executable: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 658.9 ms ± 1.8 ms [User: 538.6 ms, System: 181.1 ms] Range (min … max): 655.5 ms … 668.8 ms 100 runs Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout Time (mean ± σ): 648.6 ms ± 2.1 ms [User: 534.7 ms, System: 162.6 ms] Range (min … max): 645.2 ms … 659.5 ms 100 runs Summary 'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf build --release --example breakout' ran 1.02 ± 0.00 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout' ``` ``` $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.67s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 183,65 msec task-clock:u # 0,265 CPUs utilized ( +- 1,08% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 6603 page-faults:u # 0,036 M/sec ( +- 0,11% ) 382629371 cycles:u # 2,083 GHz ( +- 0,79% ) 673192095 instructions:u # 1,76 insn per cycle ( +- 0,03% ) 121518254 branches:u # 661,688 M/sec ( +- 0,04% ) 2698503 branch-misses:u # 2,22% of all branches ( +- 0,18% ) 0,69376 +- 0,00274 seconds time elapsed ( +- 0,39% ) $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.76s Performance counter stats for '../../cargo/cargo_improve_perf build --release --example breakout' (10 runs): 177,03 msec task-clock:u # 0,256 CPUs utilized ( +- 1,70% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 5774 page-faults:u # 0,033 M/sec ( +- 0,14% ) 381121369 cycles:u # 2,153 GHz ( +- 1,29% ) 672129390 instructions:u # 1,76 insn per cycle ( +- 0,03% ) 121248111 branches:u # 684,900 M/sec ( +- 0,04% ) 2672832 branch-misses:u # 2,20% of all branches ( +- 0,34% ) 0,6924 +- 0,0148 seconds time elapsed ( +- 2,13% ) ``` </details> <details><summary>Benchmark results as of <a href="https://github.com/rust-lang/cargo/commit/ba49b13e65fd487b8fc1761456c34e8d9feefb0e"><code>ba49b13</code></a></summary> Completely fresh: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "../../cargo/cargo_master build --release --example breakout" "../../cargo/cargo_improve_perf2 build --release --example breakout" Benchmark #1: ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 635.4 ms ± 511.6 ms [User: 146.2 ms, System: 41.1 ms] Range (min … max): 172.0 ms … 1208.7 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark #2: ../../cargo/cargo_improve_perf2 build --release --example breakout Time (mean ± σ): 165.3 ms ± 1.2 ms [User: 137.2 ms, System: 28.0 ms] Range (min … max): 163.7 ms … 171.0 ms 100 runs Summary '../../cargo/cargo_improve_perf2 build --release --example breakout' ran 3.84 ± 3.09 times faster than '../../cargo/cargo_master build --release --example breakout' ``` (statistical outliers are consistently reproducible and don't happen for any other benchmarks) ``` $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_master build --release --example breakout Finished release [optimized] target(s) in 0.16s [...] Finished release [optimized] target(s) in 0.16s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 197,21 msec task-clock:u # 0,220 CPUs utilized ( +- 0,79% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4918 page-faults:u # 0,025 M/sec ( +- 0,09% ) 395214529 cycles:u # 2,004 GHz ( +- 0,71% ) 685707083 instructions:u # 1,74 insn per cycle ( +- 0,04% ) 124571038 branches:u # 631,667 M/sec ( +- 0,05% ) 2748386 branch-misses:u # 2,21% of all branches ( +- 0,35% ) 0,897 +- 0,155 seconds time elapsed ( +- 17,32% ) $ RUSTC=$(rustup which rustc) perf stat -r10 ../../cargo/cargo_improve_perf2 build --release --example breakout Finished release [optimized] target(s) in 0.14s [...] Finished release [optimized] target(s) in 0.15s Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs): 168,28 msec task-clock:u # 0,621 CPUs utilized ( +- 0,51% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 4086 page-faults:u # 0,024 M/sec ( +- 0,15% ) 371029906 cycles:u # 2,205 GHz ( +- 0,48% ) 667493108 instructions:u # 1,80 insn per cycle ( +- 0,02% ) 120202436 branches:u # 714,308 M/sec ( +- 0,03% ) 2659209 branch-misses:u # 2,21% of all branches ( +- 0,13% ) 0,271 +- 0,103 seconds time elapsed ( +- 37,82% ) ``` Need to recompile single executable: ``` $ RUSTC=$(rustup which rustc) hyperfine --warmup 10 --runs 100 "touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout" "touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout" Benchmark #1: touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout Time (mean ± σ): 660.7 ms ± 2.9 ms [User: 545.6 ms, System: 175.2 ms] Range (min … max): 656.2 ms … 675.1 ms 100 runs Benchmark #2: touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout Time (mean ± σ): 650.2 ms ± 5.3 ms [User: 542.9 ms, System: 156.0 ms] Range (min … max): 645.9 ms … 687.7 ms 100 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary 'touch examples/game/breakout.rs && ../../cargo/cargo_improve_perf2 build --release --example breakout' ran 1.02 ± 0.01 times faster than 'touch examples/game/breakout.rs && ../../cargo/cargo_master build --release --example breakout' ``` ``` $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_master build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.65s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.66s Performance counter stats for '../../cargo/cargo_master build --release --example breakout' (10 runs): 181,52 msec task-clock:u # 0,264 CPUs utilized ( +- 0,29% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 6618 page-faults:u # 0,036 M/sec ( +- 0,09% ) 381324766 cycles:u # 2,101 GHz ( +- 0,21% ) 673170130 instructions:u # 1,77 insn per cycle ( +- 0,03% ) 121511051 branches:u # 669,422 M/sec ( +- 0,04% ) 2700116 branch-misses:u # 2,22% of all branches ( +- 0,17% ) 0,68766 +- 0,00293 seconds time elapsed ( +- 0,43% ) $ RUSTC=$(rustup which rustc) perf stat -r10 --no-inherit --pre "touch examples/game/breakout.rs" ../../cargo/cargo_improve_perf2 build --release --example breakout Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.64s [...] Compiling bevy v0.3.0 (/home/bjorn/Documenten/cg_clif3/bevy) Finished release [optimized] target(s) in 0.64s Performance counter stats for '../../cargo/cargo_improve_perf2 build --release --example breakout' (10 runs): 173,78 msec task-clock:u # 0,257 CPUs utilized ( +- 0,65% ) 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 5772 page-faults:u # 0,033 M/sec ( +- 0,17% ) 378994346 cycles:u # 2,181 GHz ( +- 0,62% ) 672499584 instructions:u # 1,77 insn per cycle ( +- 0,04% ) 121341331 branches:u # 698,266 M/sec ( +- 0,05% ) 2691563 branch-misses:u # 2,22% of all branches ( +- 0,17% ) 0,67554 +- 0,00641 seconds time elapsed ( +- 0,95% ) ``` </summary>
Looks like some changes discussed in this thread have been merged at this point. I'm going to close this and ask for more specific issues to be opened up in the future as a generic "reduce overhead" has no cohesion for discussion and no end goal. |
Describe the problem you are trying to solve
When only a single small crate of a large crate graph is recompiled, cargo has a significant overhead trying to parse all
Cargo.toml
files and resolve all dependencies. On Bevy I measured an overhead of 28% when doingtouch examples/game/breakout.rs; cargo build --example breakout
after bevyengine/bevy#791.Describe the solution you'd like
I would like the overhead of cargo to be reduced. This could be done by optimizing the relevant functions or by for example adding a binary cache for a specific cargo invocation that contains the build plan (I think it is called that way) of everything that cargo will do. This binary cache could be invalidated on any
Cargo.toml
change or change to the cargo arguments. In case of argument change it would be preferable to keep the old cache around though, as rust-analyzer may runcargo check
and the usercargo run
.Notes
Profiles
By the way I noticed that the new resolver is a bit slower than the old one:
cc bevyengine/bevy#791
The text was updated successfully, but these errors were encountered: