-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Improve CGU creation algorithms #111900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CGU creation algorithms #111900
Conversation
It is small and has a single call site, and this change will facilitate the next commit.
Three of the four methods in `DefaultPartitioning` are defined in `default.rs`. But `merge_codegen_units` is defined in a separate module, `merging`, even though it's less than 100 lines of code and roughly the same size as the other three methods. (Also, the `merging` module currently sits alongside `default`, when it should be a submodule of `default`, adding to the confusion.) In rust-lang#74275 this explanation was given: > I pulled this out into a separate module since it seemed like we might > want a few different merge algorithms to choose from. But in the three years since there have been no additional merging algorithms, and there is no mechanism for choosing between different merging algorithms. (There is a mechanism, `-Zcgu-partitioning-strategy`, for choosing between different partitioning strategies, but the merging algorithm is just one piece of a partitioning strategy.) This commit merges `merging` into `default`, making the code easier to navigate and read.
I find that these structs obfuscate the code. Removing them and just passing the individual fields around makes the `Partition` method signatures a little longer, but makes the data flow much clearer. E.g. - `codegen_units` is mutable all the way through. - `codegen_units`'s length is changed by `merge_codegen_units`, but only the individual elements are changed by `place_inlined_mono_items` and `internalize_symbols`. - `roots`, `internalization_candidates`, and `mono_item_placements` are all immutable after creation, and all used by just one of the four methods.
This is something that took me some time to figure out.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 729a48897e17b149656fe315c7941bd3c3df2a89 with merge 13166eef596f11ec3a87d7de3557a4d310cad482... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (13166eef596f11ec3a87d7de3557a4d310cad482): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 647.507s -> 688.121s (6.27%) |
Terrible results here. Some make sense, some don't.
|
I fixed the non-determinism issue, so the perf results for incremental builds should now be less catastrophic. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 5005ebb with merge 2cf271b196764e749f738e7085ae01a46998d8ce... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (2cf271b196764e749f738e7085ae01a46998d8ce): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 646.176s -> 659.34s (2.04%) |
New results are better, but still terrible. |
For posterity: the splitting did result in much more evenly sized CGUs. Here are some ratios between the smallest and largest CGU for various benchmarks. Clearly CGU size is much less important than other characteristics (inlining?) when it comes to compiler perf.
|
An attempt to improve things.
r? @ghost