Skip to content

Change branching in iter.skip() #80715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2021
Merged

Change branching in iter.skip() #80715

merged 2 commits into from
Jan 23, 2021

Conversation

JulianKnodt
Copy link
Contributor

@JulianKnodt JulianKnodt commented Jan 5, 2021

Optimize branching in Skip, which was brought up in #80416.
This assumes that if next is called, it's likely that there will be more calls to next, and the branch for skip will only be hit once thus it's unlikely to take that path. Even w/o the unlikely intrinsic, it compiles more efficiently, I believe because the path where next is called is always taken.

It should be noted there are very few places in the compiler where Skip is used, so probably won't have a noticeable perf impact.

New impl
Old impl

Some additional asm examples although they really don't have a ton of difference between them.

@rust-highfive
Copy link
Contributor

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 5, 2021
@the8472
Copy link
Member

the8472 commented Jan 5, 2021

But does it improve the motivating example, i.e. the .zip().zip().fold() in the FFT function?

@JulianKnodt
Copy link
Contributor Author

Hm I'm not particularly sure about that specific example, but I'll try to provide more cases in the godbolt to compare against.

@nagisa
Copy link
Member

nagisa commented Jan 5, 2021

I was not seeing much difference in the provided godbolt, so I made up my own, simpler, test cases https://godbolt.org/z/9P8Tej and I still am having hard time seeing whether one is better than the other, both look very similar to me.

So I think you may have better success if you find an example where the improvements are very obvious, write a benchmark or describe more precisely why the codegen in with proposed code is better.

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned joshtriplett Jan 5, 2021
@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Jan 5, 2021

I'm compiling the benchmarks which use skip in core, but it takes like a good 2 hrs for it to compile on my machine so it may be a bit before I have results

@JulianKnodt
Copy link
Contributor Author

JulianKnodt commented Jan 5, 2021

New impl benchmarks:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,511,135 ns/iter (+/- 72,608)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,002,478 ns/iter (+/- 102,090)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,709,985 ns/iter (+/- 60,129)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,342,344 ns/iter (+/- 20,248)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,170,133 ns/iter (+/- 162,633)
test iter::bench_skip_chain_sum                                 ... bench:     759,909 ns/iter (+/- 10,462)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   6,170,983 ns/iter (+/- 47,253)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,242,784 ns/iter (+/- 7,138)
test iter::bench_skip_ref_sum                                   ... bench:   1,668,988 ns/iter (+/- 3,920)
test iter::bench_skip_sum                                       ... bench:     379,766 ns/iter (+/- 10,567)
test iter::bench_skip_then_zip                                  ... bench:         142 ns/iter (+/- 0)
test iter::bench_zip_then_skip                                  ... bench:          98 ns/iter (+/- 0)

New impl w/o ? after nth:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,553,941 ns/iter (+/- 163,030)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,122,332 ns/iter (+/- 39,716)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,738,281 ns/iter (+/- 7,677)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,286,484 ns/iter (+/- 71,319)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,078,560 ns/iter (+/- 42,944)
test iter::bench_skip_chain_sum                                 ... bench:     759,928 ns/iter (+/- 3,369)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   6,090,941 ns/iter (+/- 15,114)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,389,274 ns/iter (+/- 13,194)
test iter::bench_skip_ref_sum                                   ... bench:   1,662,695 ns/iter (+/- 6,873)
test iter::bench_skip_sum                                       ... bench:     379,615 ns/iter (+/- 3,894)
test iter::bench_skip_then_zip                                  ... bench:         143 ns/iter (+/- 0)
test iter::bench_zip_then_skip                                  ... bench:         100 ns/iter (+/- 0)

Old impl benchmark:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,496,163 ns/iter (+/- 271,462)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,017,929 ns/iter (+/- 22,208)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,725,668 ns/iter (+/- 15,420)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,341,441 ns/iter (+/- 20,861)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,264,607 ns/iter (+/- 113,524)
test iter::bench_skip_chain_sum                                 ... bench:     759,955 ns/iter (+/- 7,522)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   7,042,158 ns/iter (+/- 499,251)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,947,926 ns/iter (+/- 230,305)
test iter::bench_skip_ref_sum                                   ... bench:   1,836,314 ns/iter (+/- 64,051)
test iter::bench_skip_sum                                       ... bench:     379,617 ns/iter (+/- 297)
test iter::bench_skip_then_zip                                  ... bench:         126 ns/iter (+/- 1)
test iter::bench_zip_then_skip                                  ... bench:         100 ns/iter (+/- 0)

@the8472
Copy link
Member

the8472 commented Jan 6, 2021

I'm compiling the benchmarks which use skip in core, but it takes like a good 2 hrs for it to compile on my machine so it may be a bit before I have results

To bench changes in libcore you can do a stage 0 build. x.py bench --stage 0 library/core --test-args <name of your benchmark>. Even on my laptop and with turbo boost disabled to get more consistent results that only takes a few minutes.

@the8472
Copy link
Member

the8472 commented Jan 6, 2021

These wins don't seem as dramatic as the improvements in #80416

@JulianKnodt
Copy link
Contributor Author

Ah I was compiling stage 1 but I'll do stage 0 in the future.

I think for that specifically, an eager iteration w/ nth is fine, so it removes the branching and doesn't need to does as much compiler magic. For the next fn of skip at least, there's a not a ton of stuff to change. Is there something else that could be changed, such as specialization(altho something like that would probably be in a separate PR)?

@the8472
Copy link
Member

the8472 commented Jan 6, 2021

Ah, maybe I misunderstood, I thought the goal was to fix the reported issue rather than this being a tangential optimization.

Is there something else that could be changed, such as specialization(altho something like that would probably be in a separate PR)?

Possibly a custom fold implementation that unrolls the first loop iteration, as mentioned in a comment on that issue.

@nagisa
Copy link
Member

nagisa commented Jan 22, 2021

@bors r+

@bors
Copy link
Collaborator

bors commented Jan 22, 2021

📌 Commit e5094a2 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 22, 2021
@bors
Copy link
Collaborator

bors commented Jan 23, 2021

⌛ Testing commit e5094a2 with merge 4153fa8...

@bors
Copy link
Collaborator

bors commented Jan 23, 2021

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 4153fa8 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 23, 2021
@bors bors merged commit 4153fa8 into rust-lang:master Jan 23, 2021
@rustbot rustbot added this to the 1.51.0 milestone Jan 23, 2021
@JulianKnodt JulianKnodt deleted the skip_opt branch January 23, 2021 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants