Skip to content

Our GCC LTO flags can be improved #132257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Fidget-Spinner opened this issue Apr 8, 2025 · 0 comments
Closed

Our GCC LTO flags can be improved #132257

Fidget-Spinner opened this issue Apr 8, 2025 · 0 comments
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Apr 8, 2025

Feature or enhancement

Proposal:

@thesamesam pointed out to me that our GCC LTO configuration builds serially and as a single translation unit IIUC. This is the slowest configuration possible. On GCC 15, the LTO build takes 10m14.972s, in my first PR, it takes 2m28.287s. This is a multiple factor reduction in build times.

Benchmarks show basically no change in performance --- 1.004x slower on one machine, and 1.000x faster on another machine. This is basically in the realm of noise.

https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250407-3.14.0a6+-8891cd2

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@Fidget-Spinner Fidget-Spinner added type-feature A feature request or enhancement build The build process and cross-build labels Apr 8, 2025
@picnixz picnixz added the performance Performance or resource usage label Apr 8, 2025
Fidget-Spinner added a commit that referenced this issue Apr 11, 2025
Change the default LTO flags on GCC to not pass -flto-partition=none, and allow parallelization of LTO. This has a multiple factor speedup for LTO build times on GCC, with no noticeable loss in performance.

On newer make and newer GCC, this passes the jobserver automatically to GCC (or more like GCC grabs it from the env vars).

On older make, this will have benign warnings about serial compilation. It's safe to ignore them.
thesamesam added a commit to thesamesam/gentoo that referenced this issue May 10, 2025
The broken autoconf-archive macro which required us to pass -ffat-lto-objects
as a workaround has been fixed and backported to >= CPython 3.12:
python/cpython#89640 (comment)

Note that the Python build system still adds this anyway but hopefully
that can be dropped in future, like -flto-partition=one was in
python/cpython#132257.

Bug: https://bugs.gentoo.org/700012
Signed-off-by: Sam James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants