-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Performance regression #15203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @dotdash, there's a suspicion that it's caused by the bool -> i1 conversion. |
Okay, I have narrowed it down specifically to |
let mut sieve = Vec::with_capacity(size);
unsafe {
let p = sieve.as_mut_ptr();
std::ptr::set_memory(p, 1, size);
sieve.set_len(size);
} This generates code with memset. And it is much, much slower than the code that stores the 1s by hand. Lots of pagefaults too. |
@mahkoh that code is almost three times faster that storing the ones for me. Original:
With your "hack":
|
Ha, you're right. It's the same here. I didn't realize that rust would change the benchmark depending on the results. I was looking at the output of perf only. |
LLVM's LoopIdionRecognize pass currently only considers integers that are byte width multiples, i.e. i8, i16 and so on, when looking for opportunities to replace stores by a memset. I've modified the relevant places to also accept values with a bitwidth < 8. With that, at least |
Actually, the test suite crash was just my fragile "debugging" code. Removing that made it pass the test suite. So I'll test the new patched LLVM version with rustc tomorrow and submit the patch upstream if it works as expected. @retep998 Thanks for spotting this! |
Just a small update since this takes longer than expected / "announced" ;-) Looking through LLVM's code I noticed a few more optz that won't trigger with Also, I realized that I actually misread the comment in the clang sources that made me use The difference from what rust used to do, is that it happens on |
LLVM doesn't really like types with a bit-width that isn't a multiple of 8 and disable various optimizations if it encounters such types used with loads/stores. OTOH, booleans must be represented as i1 when used as SSA values. To get the best results, we must use i1 for SSA values, and i8 when storing the value to memory. By using range asserts on loads, LLVM can eliminate the required zero-extend and truncate operations. Fixes rust-lang#15203
LLVM doesn't handle i1 value in allocas/memory very well and skips a number of optimizations if it hits it. So we have to do the same thing that Clang does, using i1 for SSA values, but storing i8 in memory. Fixes #15203.
Shuffle some proc_macro_expand query things around Removes some unnecessary extra work we are doing in proc-macro expansion, and more importantly `Arc` the result of the proc_macro_expand query, that way we can reuse the instance for the `macro_expand` query's result
When moving from the June 20 to the June 26 nightly, this code became twice as slow (comparison of ASM included):
https://gist.github.com/retep998/bf4fd704bbd456912c22
Went from 250 microseconds to 500 microseconds.
My build command:
rustc.exe --opt-level=3 euler.rs --test --emit=asm,link -Ctarget-cpu=amdfam10 -Cllvm-args="--x86-asm-syntax=intel"
Using Windows 8.1 x64 with an AMD Phenom II processor.
The text was updated successfully, but these errors were encountered: