-
Notifications
You must be signed in to change notification settings - Fork 5k
[RISC-V] Simplifying the loop generated in genZeroInitFrameUsingBlockInit and jump encoding #114003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
RISC-V Release-CLR-VF2: 9404 / 9547 (98.50%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: |
4893367 is being scheduled for building and testingGIT: |
You could remove the counter EDIT: also, if you're zeroing 4 doublewords at a time, the |
@tomeksowi Good catch! Will change
It's a mistake in PR description - in code it is correct: |
e849ffc is being scheduled for building and testingGIT: |
39212af is being scheduled for building and testingGIT: |
RISC-V Release-FX-VF2: 0 / 258 (0.00%)
Build information and commandsGIT: RISC-V Release-CLR-VF2: 9480 / 9547 (99.30%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9482 / 9547 (99.32%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 0 / 258 (0.00%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
…r bnez/beqz menmonics
RISC-V Release-CLR-VF2: 9524 / 9548 (99.75%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-VF2: 431638 / 493981 (87.38%)
Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9524 / 9548 (99.75%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 392211 / 466910 (84.00%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
RISC-V Release-CLR-VF2: 9478 / 9551 (99.24%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9478 / 9551 (99.24%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: |
RISC-V Release-CLR-VF2: 9478 / 9551 (99.24%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: |
RISC-V Release-CLR-VF2: 9479 / 9551 (99.25%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9480 / 9551 (99.26%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: |
RISC-V Release-CLR-VF2: 9476 / 9551 (99.21%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9477 / 9551 (99.23%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 434206 / 498796 (87.05%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
RISC-V Release-CLR-VF2: 9477 / 9552 (99.21%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9478 / 9552 (99.23%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 363721 / 434281 (83.75%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
In other PRs, it's 99.79% Is it a regression in this change? |
RISC-V Release-CLR-VF2: 9518 / 9552 (99.64%)
Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-CLR-QEMU: 9521 / 9552 (99.68%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz Build information and commandsGIT: RISC-V Release-FX-QEMU: 134013 / 205549 (65.20%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and commandsGIT: |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
The main point of this pr is to optimize loop generated in
genZeroInitFrameUsingBlockInit
. Partial unrolling of the loop for cases where there is 12 or more reg slots on the stack reduced the loop iterations, resulting in an average performance gain of about 1.41% when running coreCLR tests (10 samples before and after the change) and about 0.5% on average for coreFX tests.loop example for 19 slots
From the very beginning we had
INS_bnez
andINS_beqz
pseudo-instructions, but it wasn't possible to emit them usingemitIns_R_I
. Now it is possible (and recommended).As for "C" extension, new instructions will be defined with prefix
c.
, so for examplebnez
in "C" extension will bec.bnez
In the case of changes in the encoding of jumps, I think it is now more readable, but the reduction of one shift and a few constants (which the compiler will optimize) does not increase performance in any significant way. So I'm waiting for your feedback.
part of #84834, cc @dotnet/samsung