Change branching in `iter.skip()` #80715

JulianKnodt · 2021-01-05T06:18:17Z

Optimize branching in Skip, which was brought up in #80416.
This assumes that if next is called, it's likely that there will be more calls to next, and the branch for skip will only be hit once thus it's unlikely to take that path. Even w/o the unlikely intrinsic, it compiles more efficiently, I believe because the path where next is called is always taken.

It should be noted there are very few places in the compiler where Skip is used, so probably won't have a noticeable perf impact.

New impl
Old impl

Some additional asm examples although they really don't have a ton of difference between them.

rust-highfive · 2021-01-05T06:18:20Z

r? @joshtriplett

(rust-highfive has picked a reviewer for you, use r? to override)

the8472 · 2021-01-05T09:24:27Z

But does it improve the motivating example, i.e. the .zip().zip().fold() in the FFT function?

JulianKnodt · 2021-01-05T09:34:01Z

Hm I'm not particularly sure about that specific example, but I'll try to provide more cases in the godbolt to compare against.

nagisa · 2021-01-05T22:41:27Z

I was not seeing much difference in the provided godbolt, so I made up my own, simpler, test cases https://godbolt.org/z/9P8Tej and I still am having hard time seeing whether one is better than the other, both look very similar to me.

So I think you may have better success if you find an example where the improvements are very obvious, write a benchmark or describe more precisely why the codegen in with proposed code is better.

r? @nagisa

JulianKnodt · 2021-01-05T23:02:10Z

I'm compiling the benchmarks which use skip in core, but it takes like a good 2 hrs for it to compile on my machine so it may be a bit before I have results

JulianKnodt · 2021-01-05T23:45:02Z

New impl benchmarks:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,511,135 ns/iter (+/- 72,608)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,002,478 ns/iter (+/- 102,090)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,709,985 ns/iter (+/- 60,129)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,342,344 ns/iter (+/- 20,248)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,170,133 ns/iter (+/- 162,633)
test iter::bench_skip_chain_sum                                 ... bench:     759,909 ns/iter (+/- 10,462)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   6,170,983 ns/iter (+/- 47,253)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,242,784 ns/iter (+/- 7,138)
test iter::bench_skip_ref_sum                                   ... bench:   1,668,988 ns/iter (+/- 3,920)
test iter::bench_skip_sum                                       ... bench:     379,766 ns/iter (+/- 10,567)
test iter::bench_skip_then_zip                                  ... bench:         142 ns/iter (+/- 0)
test iter::bench_zip_then_skip                                  ... bench:          98 ns/iter (+/- 0)

New impl w/o ? after nth:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,553,941 ns/iter (+/- 163,030)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,122,332 ns/iter (+/- 39,716)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,738,281 ns/iter (+/- 7,677)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,286,484 ns/iter (+/- 71,319)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,078,560 ns/iter (+/- 42,944)
test iter::bench_skip_chain_sum                                 ... bench:     759,928 ns/iter (+/- 3,369)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   6,090,941 ns/iter (+/- 15,114)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,389,274 ns/iter (+/- 13,194)
test iter::bench_skip_ref_sum                                   ... bench:   1,662,695 ns/iter (+/- 6,873)
test iter::bench_skip_sum                                       ... bench:     379,615 ns/iter (+/- 3,894)
test iter::bench_skip_then_zip                                  ... bench:         143 ns/iter (+/- 0)
test iter::bench_zip_then_skip                                  ... bench:         100 ns/iter (+/- 0)

Old impl benchmark:

test iter::bench_cycle_skip_take_ref_sum                        ... bench:   3,496,163 ns/iter (+/- 271,462)
test iter::bench_cycle_skip_take_sum                            ... bench:   3,017,929 ns/iter (+/- 22,208)
test iter::bench_cycle_take_skip_ref_sum                        ... bench:   2,725,668 ns/iter (+/- 15,420)
test iter::bench_cycle_take_skip_sum                            ... bench:   2,341,441 ns/iter (+/- 20,861)
test iter::bench_skip_chain_ref_sum                             ... bench:   4,264,607 ns/iter (+/- 113,524)
test iter::bench_skip_chain_sum                                 ... bench:     759,955 ns/iter (+/- 7,522)
test iter::bench_skip_cycle_skip_zip_add_ref_sum                ... bench:   7,042,158 ns/iter (+/- 499,251)
test iter::bench_skip_cycle_skip_zip_add_sum                    ... bench:   4,947,926 ns/iter (+/- 230,305)
test iter::bench_skip_ref_sum                                   ... bench:   1,836,314 ns/iter (+/- 64,051)
test iter::bench_skip_sum                                       ... bench:     379,617 ns/iter (+/- 297)
test iter::bench_skip_then_zip                                  ... bench:         126 ns/iter (+/- 1)
test iter::bench_zip_then_skip                                  ... bench:         100 ns/iter (+/- 0)

the8472 · 2021-01-06T07:52:54Z

I'm compiling the benchmarks which use skip in core, but it takes like a good 2 hrs for it to compile on my machine so it may be a bit before I have results

To bench changes in libcore you can do a stage 0 build. x.py bench --stage 0 library/core --test-args <name of your benchmark>. Even on my laptop and with turbo boost disabled to get more consistent results that only takes a few minutes.

the8472 · 2021-01-06T08:03:30Z

These wins don't seem as dramatic as the improvements in #80416

JulianKnodt · 2021-01-06T08:31:50Z

Ah I was compiling stage 1 but I'll do stage 0 in the future.

I think for that specifically, an eager iteration w/ nth is fine, so it removes the branching and doesn't need to does as much compiler magic. For the next fn of skip at least, there's a not a ton of stuff to change. Is there something else that could be changed, such as specialization(altho something like that would probably be in a separate PR)?

the8472 · 2021-01-06T08:41:16Z

Ah, maybe I misunderstood, I thought the goal was to fix the reported issue rather than this being a tangential optimization.

Is there something else that could be changed, such as specialization(altho something like that would probably be in a separate PR)?

Possibly a custom fold implementation that unrolls the first loop iteration, as mentioned in a comment on that issue.

nagisa · 2021-01-22T12:11:36Z

@bors r+

bors · 2021-01-22T12:11:38Z

📌 Commit e5094a2 has been approved by nagisa

bors · 2021-01-23T09:25:17Z

⌛ Testing commit e5094a2 with merge 4153fa8...

bors · 2021-01-23T13:12:09Z

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 4153fa8 to master...

rust-highfive assigned joshtriplett Jan 5, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 5, 2021

JulianKnodt force-pushed the skip_opt branch from 89bed48 to 8d4e9f0 Compare January 5, 2021 06:18

rust-highfive assigned nagisa and unassigned joshtriplett Jan 5, 2021

JulianKnodt added 2 commits January 8, 2021 08:05

Clean up branching in skip

fb8cc7b

Add more benchmarks

e5094a2

JulianKnodt force-pushed the skip_opt branch from 8d4e9f0 to e5094a2 Compare January 8, 2021 09:51

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 22, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 23, 2021

bors merged commit 4153fa8 into rust-lang:master Jan 23, 2021

rustbot added this to the 1.51.0 milestone Jan 23, 2021

JulianKnodt deleted the skip_opt branch January 23, 2021 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change branching in `iter.skip()` #80715

Change branching in `iter.skip()` #80715

JulianKnodt commented Jan 5, 2021 •

edited

Loading

rust-highfive commented Jan 5, 2021

the8472 commented Jan 5, 2021

JulianKnodt commented Jan 5, 2021

nagisa commented Jan 5, 2021

JulianKnodt commented Jan 5, 2021 •

edited

Loading

JulianKnodt commented Jan 5, 2021 •

edited

Loading

the8472 commented Jan 6, 2021 •

edited

Loading

the8472 commented Jan 6, 2021

JulianKnodt commented Jan 6, 2021

the8472 commented Jan 6, 2021

nagisa commented Jan 22, 2021

bors commented Jan 22, 2021

bors commented Jan 23, 2021

bors commented Jan 23, 2021

Change branching in iter.skip() #80715

Change branching in iter.skip() #80715

Conversation

JulianKnodt commented Jan 5, 2021 • edited Loading

rust-highfive commented Jan 5, 2021

the8472 commented Jan 5, 2021

JulianKnodt commented Jan 5, 2021

nagisa commented Jan 5, 2021

JulianKnodt commented Jan 5, 2021 • edited Loading

JulianKnodt commented Jan 5, 2021 • edited Loading

the8472 commented Jan 6, 2021 • edited Loading

the8472 commented Jan 6, 2021

JulianKnodt commented Jan 6, 2021

the8472 commented Jan 6, 2021

nagisa commented Jan 22, 2021

bors commented Jan 22, 2021

bors commented Jan 23, 2021

bors commented Jan 23, 2021

Change branching in `iter.skip()` #80715

Change branching in `iter.skip()` #80715

JulianKnodt commented Jan 5, 2021 •

edited

Loading

JulianKnodt commented Jan 5, 2021 •

edited

Loading

JulianKnodt commented Jan 5, 2021 •

edited

Loading

the8472 commented Jan 6, 2021 •

edited

Loading