Improve CGU creation algorithms #111900

nnethercote · 2023-05-24T07:07:59Z

An attempt to improve things.

It is small and has a single call site, and this change will facilitate the next commit.

Three of the four methods in `DefaultPartitioning` are defined in `default.rs`. But `merge_codegen_units` is defined in a separate module, `merging`, even though it's less than 100 lines of code and roughly the same size as the other three methods. (Also, the `merging` module currently sits alongside `default`, when it should be a submodule of `default`, adding to the confusion.) In rust-lang#74275 this explanation was given: > I pulled this out into a separate module since it seemed like we might > want a few different merge algorithms to choose from. But in the three years since there have been no additional merging algorithms, and there is no mechanism for choosing between different merging algorithms. (There is a mechanism, `-Zcgu-partitioning-strategy`, for choosing between different partitioning strategies, but the merging algorithm is just one piece of a partitioning strategy.) This commit merges `merging` into `default`, making the code easier to navigate and read.

I find that these structs obfuscate the code. Removing them and just passing the individual fields around makes the `Partition` method signatures a little longer, but makes the data flow much clearer. E.g. - `codegen_units` is mutable all the way through. - `codegen_units`'s length is changed by `merge_codegen_units`, but only the individual elements are changed by `place_inlined_mono_items` and `internalize_symbols`. - `roots`, `internalization_candidates`, and `mono_item_placements` are all immutable after creation, and all used by just one of the four methods.

This is something that took me some time to figure out.

nnethercote · 2023-05-24T07:08:21Z

@bors try @rust-timer queue

bors · 2023-05-24T07:08:30Z

⌛ Trying commit 729a48897e17b149656fe315c7941bd3c3df2a89 with merge 13166eef596f11ec3a87d7de3557a4d310cad482...

bors · 2023-05-24T09:20:51Z

☀️ Try build successful - checks-actions
Build commit: 13166eef596f11ec3a87d7de3557a4d310cad482 (13166eef596f11ec3a87d7de3557a4d310cad482)

rust-timer · 2023-05-24T10:44:39Z

Finished benchmarking commit (13166eef596f11ec3a87d7de3557a4d310cad482): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	97.4%	[0.5%, 6569.2%]	236
Regressions ❌ (secondary)	6.1%	[0.7%, 24.5%]	250
Improvements ✅ (primary)	-1.8%	[-1.8%, -1.8%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	97.0%	[-1.8%, 6569.2%]	237

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.2%	[1.3%, 35.8%]	217
Regressions ❌ (secondary)	6.7%	[1.5%, 16.9%]	252
Improvements ✅ (primary)	-15.9%	[-34.4%, -2.7%]	8
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	5.4%	[-34.4%, 35.8%]	225

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	88.8%	[0.8%, 5590.4%]	234
Regressions ❌ (secondary)	7.8%	[0.9%, 37.7%]	248
Improvements ✅ (primary)	-1.0%	[-1.0%, -1.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	88.4%	[-1.0%, 5590.4%]	235

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	13.7%	[0.0%, 72.9%]	57
Regressions ❌ (secondary)	6.0%	[0.4%, 30.3%]	6
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	13.5%	[-0.1%, 72.9%]	58

Bootstrap: 647.507s -> 688.121s (6.27%)

nnethercote · 2023-05-25T00:24:27Z

Terrible results here. Some make sense, some don't.

The gigantic regressions are for incr-unchanged and incr-patched. As written, the CGU splitting I added is non-deterministic because it depends on hash table iteration order. This effectively breaks incremental compilation for the backend, because the CGUs formed are completely different even when the code is unchanged. Fixing this should be straightforward.
But there are also lots of large regressions (up to 35% walltime) for even check-full builds. This doesn't make sense, because check builds shouldn't be affected by CGU formation. And I can't replicate these locally.

nnethercote · 2023-05-25T03:50:58Z

I fixed the non-determinism issue, so the perf results for incremental builds should now be less catastrophic.

@bors try @rust-timer queue

bors · 2023-05-25T03:51:07Z

⌛ Trying commit 5005ebb with merge 2cf271b196764e749f738e7085ae01a46998d8ce...

bors · 2023-05-25T06:02:23Z

☀️ Try build successful - checks-actions
Build commit: 2cf271b196764e749f738e7085ae01a46998d8ce (2cf271b196764e749f738e7085ae01a46998d8ce)

rust-timer · 2023-05-25T09:51:57Z

Finished benchmarking commit (2cf271b196764e749f738e7085ae01a46998d8ce): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	24.7%	[0.2%, 908.3%]	82
Regressions ❌ (secondary)	0.8%	[0.2%, 14.3%]	58
Improvements ✅ (primary)	-1.0%	[-3.1%, -0.3%]	5
Improvements ✅ (secondary)	-1.1%	[-2.1%, -0.6%]	17
All ❌✅ (primary)	23.2%	[-3.1%, 908.3%]	87

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.7%	[0.7%, 20.0%]	29
Regressions ❌ (secondary)	1.8%	[1.8%, 1.8%]	1
Improvements ✅ (primary)	-14.5%	[-36.8%, -3.2%]	9
Improvements ✅ (secondary)	-2.8%	[-3.1%, -2.4%]	2
All ❌✅ (primary)	0.2%	[-36.8%, 20.0%]	38

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	68.5%	[1.1%, 862.2%]	27
Regressions ❌ (secondary)	4.4%	[1.5%, 15.6%]	7
Improvements ✅ (primary)	-2.0%	[-3.2%, -1.2%]	5
Improvements ✅ (secondary)	-8.8%	[-14.0%, -4.6%]	10
All ❌✅ (primary)	57.5%	[-3.2%, 862.2%]	32

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	10.4%	[0.0%, 47.0%]	56
Regressions ❌ (secondary)	3.7%	[0.3%, 17.8%]	6
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	10.2%	[-0.1%, 47.0%]	57

Bootstrap: 646.176s -> 659.34s (2.04%)

nnethercote · 2023-05-25T23:20:33Z

New results are better, but still terrible.

nnethercote · 2023-05-25T23:25:19Z

For posterity: the splitting did result in much more evenly sized CGUs. Here are some ratios between the smallest and largest CGU for various benchmarks. Clearly CGU size is much less important than other characteristics (inlining?) when it comes to compiler perf.

-----------------------------------------------------------------------------
opt
-----------------------------------------------------------------------------
        post-merge      final
        old  new        old  new
cargo:  4.7  1.0        4.0  1.9
crane:  2.5  1.0        2.4  1.7
dies:  29.5  2.6       46.4  8.6
ripg:   4.7  1.1        3.2  1.9
webr:   3.5  1.0        4.2  2.1

-----------------------------------------------------------------------------
debug
-----------------------------------------------------------------------------
        post-merge      final
        old    new      old    new
cargo:  480.9  1.1      480.0  1.1
crane:  760.0  9.4      760.0  9.5
diesel: 375.5  ditto    375.5  ditto
ripg:   12397  ditto    12413  ditto
webr:   1000   1.6       1000  2.0

nnethercote added 6 commits May 24, 2023 10:05

Inline and remove numbered_codegen_unit_name.

e26c0c9

It is small and has a single call site, and this change will facilitate the next commit.

Add a clarifying comment.

59c5259

This is something that took me some time to figure out.

XXX: split big CGUs

dc8ab7e

XXX: better merging, plus splitting fixes

efbe288

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 24, 2023

nnethercote marked this pull request as draft May 24, 2023 07:08

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 24, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 24, 2023

XXX: disable debug printing

1076b01

XXX: fix non-determinism

5005ebb

nnethercote force-pushed the cgu-algo branch from 729a488 to 5005ebb Compare May 25, 2023 03:50

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 25, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 25, 2023

nnethercote closed this May 25, 2023

nnethercote mentioned this pull request Jun 18, 2023

improve cgu schedule with greedy partition #112766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CGU creation algorithms #111900

Improve CGU creation algorithms #111900

nnethercote commented May 24, 2023

nnethercote commented May 24, 2023

This comment has been minimized.

bors commented May 24, 2023

bors commented May 24, 2023

This comment has been minimized.

rust-timer commented May 24, 2023

nnethercote commented May 25, 2023 •

edited

Loading

nnethercote commented May 25, 2023

This comment has been minimized.

bors commented May 25, 2023

bors commented May 25, 2023

This comment has been minimized.

rust-timer commented May 25, 2023

nnethercote commented May 25, 2023

nnethercote commented May 25, 2023

Improve CGU creation algorithms #111900

Improve CGU creation algorithms #111900

Conversation

nnethercote commented May 24, 2023

nnethercote commented May 24, 2023

This comment has been minimized.

bors commented May 24, 2023

bors commented May 24, 2023

This comment has been minimized.

rust-timer commented May 24, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented May 25, 2023 • edited Loading

nnethercote commented May 25, 2023

This comment has been minimized.

bors commented May 25, 2023

bors commented May 25, 2023

This comment has been minimized.

rust-timer commented May 25, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented May 25, 2023

nnethercote commented May 25, 2023

nnethercote commented May 25, 2023 •

edited

Loading