JIT: increase inline budget #114191

AndyAyersMS · 2025-04-03T01:06:54Z

The current JIT inline strategy is prone to running out of budget at inopportune times, deeply inlining at some top-level sites and not inlining at all at others. This doesn't happen all that often, but when it does it has very adverse impact on performance.

While we await a better strategy, we can at least reduce how often this happens by increasing the budget.

Partially addresses regressions seen in #113913

The current JIT inline strategy is prone to running out of budget at inopportune times, deeply inlining at some top-level sites and not inlining at all at others. This doesn't happen all that often, but when it does it has very adverse impact on performance. While we await a better strategy, we can at least reduce how often this happens by increasing the budget. Partially addresses regressions seen in dotnet#113913

Copilot

Pull Request Overview

This PR increases the JIT inline budget to alleviate issues where the inlining strategy runs out of budget during critical compilation moments, partially addressing performance regressions noted in #113913.

Increased DEFAULT_INLINE_BUDGET from 10 to 20 in src/coreclr/jit/compiler.h

Comments suppressed due to low confidence (1)

src/coreclr/jit/compiler.h:11425

Ensure that existing performance tests address the impact of increasing the inline budget, verifying that compile times remain within acceptable ranges while benefiting inlining.

#define DEFAULT_INLINE_BUDGET 20 // Maximum estimated compile time increase via inlining

AndyAyersMS · 2025-04-03T01:07:22Z

@EgorBo PTAL
cc @dotnet/jit-contrib

SPMI will underestimate impact. Diffs

dotnet-policy-service · 2025-04-03T01:07:37Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

tannergooding · 2025-04-03T16:35:01Z

SPMI will underestimate impact. Diffs

Do we have a good way to see the impact including the VM overhead today? Will we be able to look at the crossgen or JIT startup benchmarks to help get an idea of the wins/losses?

A couple notes from looking through the diffs on things that popped out. Nothing that really impacts this PR, but rather SPMI diffs more generally that I've seen in a few PRs now and that might be nice to see if there is a way to improve anything here.

`Perf_Vector128Int:SquareRootBenchmark`

This is functionally just removing/adding the following code, but the actual diff we see is -32, +28 instead due to the register assignments basically shifting over by 1.

-            movz    x1, #0xD1FFAB1E      // code for System.Runtime.Intrinsics.Scalar`1[int]:Sqrt(int):int
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
-            blr     x1
+            scvtf   d16, w0
+            fsqrt   d16, d16
+            fcvtzs  w0, d16

Also as a note, the scvtf, fsqrt, fcvtzs sequence "could've" just used w0 for the whole thing rather than also bringing in d16, so this seems like we're unnecessarily using registers in some cases and could minimize the churn. But equally, I wonder if we could improve the diffable disasm to remove the nuance of what registers are actually used in some cases. So that if the only thing changing is LSRA using xmm0 instead of xmm1, we don't get unnecessary diffs.

2. `Utf8Json.Formatters.MicroBenchmarks_Serializers_ActiveOrUpcomingEventFormatter2:.ctor():this`

This never even gets to the disasm because they're just showing hundreds of diffs to the Final local variable assignments instead.

Many such assignments are changes to weights or use counts and aren't "that meaningful":

-;  V00 this         [V00,T07] (  5,  3.50)     ref  ->  x21         this class-hnd single-def <MicroBenchmarks.Serializers.SystemTextJsonSourceGeneratedContext>
-;  V01 arg1         [V01,T00] ( 82, 42   )     ref  ->  x19         class-hnd single-def <System.Text.Json.Utf8JsonWriter>
-;  V02 arg2         [V02,T05] (  6,  4.50)     ref  ->  x20         class-hnd single-def <MicroBenchmarks.Serializers.MyEventsListerViewModel>
+;  V00 this         [V00,T13] (  5,  3.50)     ref  ->  x21         this class-hnd single-def <MicroBenchmarks.Serializers.SystemTextJsonSourceGeneratedContext>
+;  V01 arg1         [V01,T00] (152, 77   )     ref  ->  x19         class-hnd single-def <System.Text.Json.Utf8JsonWriter>
+;  V02 arg2         [V02,T10] (  6,  4.50)     ref  ->  x20         class-hnd single-def <MicroBenchmarks.Serializers.MyEventsListerViewModel>

Others also include changes to the register used where things get shifted down by one or similar:

-;  V20 tmp17        [V20,T04] ( 14,  7   )   byref  ->  x27         "Inline stloc first use temp"
-;  V21 tmp18        [V21,T02] ( 17,  8.50)     int  ->  x28         "Inline stloc first use temp"
-;  V22 tmp19        [V22,T13] (  4,  4   )   byref  ->  x26         single-def "Inlining Arg"
-;  V23 tmp20        [V23,T10] (  9,  4.50)     ref  ->  [fp+0x10]   class-hnd spill-single-def "Inline stloc first use temp" <System.Object>
+;  V20 tmp17        [V20,T08] ( 14,  7   )   byref  ->  x26         "Inline stloc first use temp"
+;  V21 tmp18        [V21,T04] ( 17,  8.50)     int  ->  x27         "Inline stloc first use temp"
+;  V22 tmp19        [V22,T26] (  4,  4   )   byref  ->  x25         single-def "Inlining Arg"
+;  V23 tmp20        [V23,T18] (  9,  4.50)     ref  ->  x28         class-hnd single-def "Inline stloc first use temp" <System.Object>

This means we often can't see more meaningful parts of the diff and need to download and hand inspect instead.

tannergooding

LGTM. We'll want to keep a close eye out on next weeks perf triage results and adjust accordingly!

AndyAyersMS · 2025-04-03T17:27:56Z

Do we have a good way to see the impact including the VM overhead today? Will we be able to look at the crossgen or JIT startup benchmarks to help get an idea of the wins/losses?

Those historically haven't been that useful. But we can certainly check crossgen for impact.

The TP increases in the diffs are modest, but there are enough missed contexts to make interpretation difficult. These misses cut two ways: they cause cycle losses on the diff side because methods fail part way, and those methods fail because they're likely trying to do inlines that SPMI can't support, so if they had succeeded, they would likely have taken longer... so not easy to extrapolate from the info there. Also the jit can spend time trying inlines that eventually fail so don't show up as code diffs but will show up as TP diffs.

I can create a bespoke SPMI collection for ASP.NET with these changes. Likely that gives something that will replay cleaning with both old and new jits, and from that we can at least get a better measure of TP impact within the JIT.

This means we often can't see more meaningful parts of the diff and need to download and hand inspect instead.

I don't think we know how to produce output that is both limited and maximally informative. I think you just get the first N lines and sometimes that's just symbol diffs.

AndyAyersMS · 2025-04-03T20:40:48Z

With bespoke asp.net SPMI collection

[13:05:32] Total bytes of base: 44910811 (overridden on cmd)
[13:05:32] Total bytes of diff: 45096187 (overridden on cmd)
[13:05:32] Total bytes of delta: 185376 (0.41 % of base)

[13:05:32] --------------------------------------------------------------------------------
[13:05:32] 263 contexts with diffs (23 size improvements, 240 size regressions, 0 same size)
[13:05:32]                         (92 PerfScore improvements, 170 PerfScore regressions, 1 same PerfScore)
[13:05:32]   -2,068/+187,444 bytes
[13:05:32]   -14.19%/+17.93% PerfScore

[13:05:32] Warning: SuperPMI encountered missing data during the diff. The diff summary printed above may be misleading.
[13:05:32] Missing with base JIT: 9. Missing with diff JIT: 0. Total contexts: 131436.

[13:12:43] Asm diffs found
[13:12:43] Total instructions executed by base: 143114929310
[13:12:43] Total instructions executed by diff: 144529112229
[13:12:43] Total instructions executed delta: 1414182919 (0.99% of base)

So 1% TP increase (JIT only), 0.4% code size increase.

AndyAyersMS · 2025-04-03T22:16:25Z

Ok, let's merge and keep an eye out for anything unexpected.

Will kick of an SPMI collection once this gets mirrored.

Copilot AI review requested due to automatic review settings April 3, 2025 01:06

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 3, 2025

Copilot AI reviewed Apr 3, 2025

View reviewed changes

dotnet-policy-service bot assigned AndyAyersMS Apr 3, 2025

This was referenced Apr 3, 2025

System.Net.Quic tests timeout #107761

Open

System.Net.Requests test timeout #113883

Closed

System.Net.Security.Tests timeout #114152

Closed

System.Net.WebSockets.Client.Tests timeout #114153

Closed

System.Net.Security.Unit.Tests timeout #114176

Closed

tannergooding approved these changes Apr 3, 2025

View reviewed changes

EgorBo approved these changes Apr 3, 2025

View reviewed changes

AndyAyersMS merged commit 720f517 into dotnet:main Apr 3, 2025
111 of 113 checks passed

AndyAyersMS mentioned this pull request Apr 12, 2025

Microbenchmarks where inliner runs out of budget with TieredPGO #85531

Open

BruceForstall mentioned this pull request Apr 15, 2025

superpmi collection pipeline timeouts in libraries tests #114715

Closed

AndyAyersMS mentioned this pull request Apr 24, 2025

JIT: De-abstraction in .NET 10 #108913

Open

AndyAyersMS mentioned this pull request Apr 30, 2025

[Perf] Windows/x64: 4 Regressions on 10/27/2024 3:01:56 PM #109347

Closed

github-actions bot locked and limited conversation to collaborators May 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: increase inline budget #114191

JIT: increase inline budget #114191

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

AndyAyersMS commented Apr 3, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Apr 3, 2025

Uh oh!

tannergooding commented Apr 3, 2025 •

edited

Loading

Uh oh!

tannergooding left a comment

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

Uh oh!

JIT: increase inline budget #114191

JIT: increase inline budget #114191

Uh oh!

Conversation

AndyAyersMS commented Apr 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

AndyAyersMS commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Apr 3, 2025

Uh oh!

tannergooding commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf_Vector128Int:SquareRootBenchmark

2. Utf8Json.Formatters.MicroBenchmarks_Serializers_ActiveOrUpcomingEventFormatter2:.ctor():this

Uh oh!

tannergooding left a comment

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

Uh oh!

AndyAyersMS commented Apr 3, 2025

Uh oh!

Uh oh!

AndyAyersMS commented Apr 3, 2025 •

edited

Loading

tannergooding commented Apr 3, 2025 •

edited

Loading

`Perf_Vector128Int:SquareRootBenchmark`

2. `Utf8Json.Formatters.MicroBenchmarks_Serializers_ActiveOrUpcomingEventFormatter2:.ctor():this`