Skip to content

JIT: increase inline budget #114191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025
Merged

Conversation

AndyAyersMS
Copy link
Member

The current JIT inline strategy is prone to running out of budget at inopportune times, deeply inlining at some top-level sites and not inlining at all at others. This doesn't happen all that often, but when it does it has very adverse impact on performance.

While we await a better strategy, we can at least reduce how often this happens by increasing the budget.

Partially addresses regressions seen in #113913

The current JIT inline strategy is prone to running out of budget at inopportune
times, deeply inlining at some top-level sites and not inlining at all at others.
This doesn't happen all that often, but when it does it has very adverse impact
on performance.

While we await a better strategy, we can at least reduce how often this happens
by increasing the budget.

Partially addresses regressions seen in dotnet#113913
@Copilot Copilot AI review requested due to automatic review settings April 3, 2025 01:06
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 3, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases the JIT inline budget to alleviate issues where the inlining strategy runs out of budget during critical compilation moments, partially addressing performance regressions noted in #113913.

  • Increased DEFAULT_INLINE_BUDGET from 10 to 20 in src/coreclr/jit/compiler.h
Comments suppressed due to low confidence (1)

src/coreclr/jit/compiler.h:11425

  • Ensure that existing performance tests address the impact of increasing the inline budget, verifying that compile times remain within acceptable ranges while benefiting inlining.
#define DEFAULT_INLINE_BUDGET 20 // Maximum estimated compile time increase via inlining

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Apr 3, 2025

@EgorBo PTAL
cc @dotnet/jit-contrib

SPMI will underestimate impact. Diffs

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member

tannergooding commented Apr 3, 2025

SPMI will underestimate impact. Diffs

Do we have a good way to see the impact including the VM overhead today? Will we be able to look at the crossgen or JIT startup benchmarks to help get an idea of the wins/losses?


A couple notes from looking through the diffs on things that popped out. Nothing that really impacts this PR, but rather SPMI diffs more generally that I've seen in a few PRs now and that might be nice to see if there is a way to improve anything here.

Perf_Vector128Int:SquareRootBenchmark

This is functionally just removing/adding the following code, but the actual diff we see is -32, +28 instead due to the register assignments basically shifting over by 1.

-            movz    x1, #0xD1FFAB1E      // code for System.Runtime.Intrinsics.Scalar`1[int]:Sqrt(int):int
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
-            blr     x1
+            scvtf   d16, w0
+            fsqrt   d16, d16
+            fcvtzs  w0, d16

Also as a note, the scvtf, fsqrt, fcvtzs sequence "could've" just used w0 for the whole thing rather than also bringing in d16, so this seems like we're unnecessarily using registers in some cases and could minimize the churn. But equally, I wonder if we could improve the diffable disasm to remove the nuance of what registers are actually used in some cases. So that if the only thing changing is LSRA using xmm0 instead of xmm1, we don't get unnecessary diffs.

2. Utf8Json.Formatters.MicroBenchmarks_Serializers_ActiveOrUpcomingEventFormatter2:.ctor():this

This never even gets to the disasm because they're just showing hundreds of diffs to the Final local variable assignments instead.

Many such assignments are changes to weights or use counts and aren't "that meaningful":

-;  V00 this         [V00,T07] (  5,  3.50)     ref  ->  x21         this class-hnd single-def <MicroBenchmarks.Serializers.SystemTextJsonSourceGeneratedContext>
-;  V01 arg1         [V01,T00] ( 82, 42   )     ref  ->  x19         class-hnd single-def <System.Text.Json.Utf8JsonWriter>
-;  V02 arg2         [V02,T05] (  6,  4.50)     ref  ->  x20         class-hnd single-def <MicroBenchmarks.Serializers.MyEventsListerViewModel>
+;  V00 this         [V00,T13] (  5,  3.50)     ref  ->  x21         this class-hnd single-def <MicroBenchmarks.Serializers.SystemTextJsonSourceGeneratedContext>
+;  V01 arg1         [V01,T00] (152, 77   )     ref  ->  x19         class-hnd single-def <System.Text.Json.Utf8JsonWriter>
+;  V02 arg2         [V02,T10] (  6,  4.50)     ref  ->  x20         class-hnd single-def <MicroBenchmarks.Serializers.MyEventsListerViewModel>

Others also include changes to the register used where things get shifted down by one or similar:

-;  V20 tmp17        [V20,T04] ( 14,  7   )   byref  ->  x27         "Inline stloc first use temp"
-;  V21 tmp18        [V21,T02] ( 17,  8.50)     int  ->  x28         "Inline stloc first use temp"
-;  V22 tmp19        [V22,T13] (  4,  4   )   byref  ->  x26         single-def "Inlining Arg"
-;  V23 tmp20        [V23,T10] (  9,  4.50)     ref  ->  [fp+0x10]   class-hnd spill-single-def "Inline stloc first use temp" <System.Object>
+;  V20 tmp17        [V20,T08] ( 14,  7   )   byref  ->  x26         "Inline stloc first use temp"
+;  V21 tmp18        [V21,T04] ( 17,  8.50)     int  ->  x27         "Inline stloc first use temp"
+;  V22 tmp19        [V22,T26] (  4,  4   )   byref  ->  x25         single-def "Inlining Arg"
+;  V23 tmp20        [V23,T18] (  9,  4.50)     ref  ->  x28         class-hnd single-def "Inline stloc first use temp" <System.Object>

This means we often can't see more meaningful parts of the diff and need to download and hand inspect instead.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We'll want to keep a close eye out on next weeks perf triage results and adjust accordingly!

@AndyAyersMS
Copy link
Member Author

Do we have a good way to see the impact including the VM overhead today? Will we be able to look at the crossgen or JIT startup benchmarks to help get an idea of the wins/losses?

Those historically haven't been that useful. But we can certainly check crossgen for impact.

The TP increases in the diffs are modest, but there are enough missed contexts to make interpretation difficult. These misses cut two ways: they cause cycle losses on the diff side because methods fail part way, and those methods fail because they're likely trying to do inlines that SPMI can't support, so if they had succeeded, they would likely have taken longer... so not easy to extrapolate from the info there. Also the jit can spend time trying inlines that eventually fail so don't show up as code diffs but will show up as TP diffs.

I can create a bespoke SPMI collection for ASP.NET with these changes. Likely that gives something that will replay cleaning with both old and new jits, and from that we can at least get a better measure of TP impact within the JIT.

This means we often can't see more meaningful parts of the diff and need to download and hand inspect instead.

I don't think we know how to produce output that is both limited and maximally informative. I think you just get the first N lines and sometimes that's just symbol diffs.

@AndyAyersMS
Copy link
Member Author

With bespoke asp.net SPMI collection

[13:05:32] Total bytes of base: 44910811 (overridden on cmd)
[13:05:32] Total bytes of diff: 45096187 (overridden on cmd)
[13:05:32] Total bytes of delta: 185376 (0.41 % of base)

[13:05:32] --------------------------------------------------------------------------------
[13:05:32] 263 contexts with diffs (23 size improvements, 240 size regressions, 0 same size)
[13:05:32]                         (92 PerfScore improvements, 170 PerfScore regressions, 1 same PerfScore)
[13:05:32]   -2,068/+187,444 bytes
[13:05:32]   -14.19%/+17.93% PerfScore

[13:05:32] Warning: SuperPMI encountered missing data during the diff. The diff summary printed above may be misleading.
[13:05:32] Missing with base JIT: 9. Missing with diff JIT: 0. Total contexts: 131436.

[13:12:43] Asm diffs found
[13:12:43] Total instructions executed by base: 143114929310
[13:12:43] Total instructions executed by diff: 144529112229
[13:12:43] Total instructions executed delta: 1414182919 (0.99% of base)

So 1% TP increase (JIT only), 0.4% code size increase.

@AndyAyersMS AndyAyersMS merged commit 720f517 into dotnet:main Apr 3, 2025
111 of 113 checks passed
@AndyAyersMS
Copy link
Member Author

Ok, let's merge and keep an eye out for anything unexpected.

Will kick of an SPMI collection once this gets mirrored.

@github-actions github-actions bot locked and limited conversation to collaborators May 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants