Skip to content

Commit 3e0d138

Browse files
committed
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize
The znver3/znver4 scheduler modules are outliers, specifying very large LoopMicroOpBufferSizes at 512, while typical values for other subtargets are on the order of ~50. Even if this information is micro-architecturally correct (*), this does not mean that we want to runtime unroll all loops to a size that completely fills the loop buffer. Unless this is the single hot loop in the entire application, the massive code size increase will bust the micro-op and instruction caches. Protect against this by clamping to the default PartialThreshold of 150, which is the same as the default full-unroll threshold and half the aggressive full-unroll threshold. Allowing more partial unrolling than full unrolling is certainly non-sensical. (*) I strongly doubt that this is actually correct -- I believe this may derive from an incorrect reading of Agner Fog's micro-architecture guide. The number 4096 that was originally used here is the size of the general micro-op cache, not that of a loop buffer. A separate loop buffer is not listed for the Zen microarchitecture. Comparing this to the listing for Skylake, it has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer, with a note that it's rarely fully utilized. Our scheduling model specifies LoopMicroOpBufferSize of 50 in that case.
1 parent 4d5525e commit 3e0d138

File tree

2 files changed

+30
-742
lines changed

2 files changed

+30
-742
lines changed

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -575,7 +575,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
575575
if (PartialUnrollingThreshold.getNumOccurrences() > 0)
576576
MaxOps = PartialUnrollingThreshold;
577577
else if (ST->getSchedModel().LoopMicroOpBufferSize > 0)
578-
MaxOps = ST->getSchedModel().LoopMicroOpBufferSize;
578+
// Upper bound by the default PartialThreshold, which is the same as
579+
// the default full-unroll Threshold. Even if the loop micro-op buffer
580+
// is very large, this does not mean that we want to unroll all loops
581+
// to that length, as it would increase code size beyond the limits of
582+
// what unrolling normally allows.
583+
MaxOps = std::min(ST->getSchedModel().LoopMicroOpBufferSize,
584+
UP.PartialThreshold);
579585
else
580586
return;
581587

0 commit comments

Comments
 (0)