runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

jcorbin · 2019-08-23T16:11:53Z

Problem

The default setting of runtime.GOMAXPROCS() (to be the number of os-apparent processors) can be greatly misaligned with container cpu quota (e.g. as implemented through cfs bandwidth control by docker).

This can lead to large latency artifacts in programs, especially under peak load, or when saturating all processors during background GC phases.

The smaller the container / larger the machine = the worse this effect becomes: let's say you deploy a fleet of micro service workers, each container having a cpu quota of 4, on a fleet of 32 processor[1] machines.

To understand why, you really have to understand the CFS quota mechanism; this blog post does well (with pictures); this kubernetes issue further explores the topic (especially as it relates to a recently resolved kernel cpu accounting bug). But to summarize it briefly for this issue:

there is a quota period, say 100ms
and there is then a quota, say 400ms to affect a 4-processor quota
within any period, once the process group exceeds its quota it is throttled

Running an application workload at a reasonable level of cpu efficiency makes it quite likely that you'll be spiking up to your full quota and getting throttled.

Background waste workload, like concurrent GC[2], is especially likely to cause quota exhaustion.

I hesitate to even call this a "tail latency" problem; the artifacts are visible in the main body of and can shift the entire latency distribution.

Solution

If you care about latency, reliability, predictability (... insert more *ilities to taste), then the correct thing to do is to never exceed your cpu quota, by setting GOMAXPROCS=max(1, floor(cpu_quota)).

Using this as a default for GOMAXPROCS makes the world safe again, which is why we use uber-go/automaxprocs in all of our microservices.

NOTEs

intentionally avoiding use of the word "core"; the matter of hyper-threading and virtual-vs-physical cores is another topic
/digression: can't not mention userspace scheduler pressure induced by background GC; where are we at with goroutine preemption again?

The text was updated successfully, but these errors were encountered:

jcorbin · 2019-08-23T16:23:24Z

I really have to disagree with some of the latter suggestions in kubernetes/kubernetes#67577
like
kubernetes/kubernetes#67577 (comment): using ceil(quota)+2, or in any way over-provisioning GOMAXPROCS vs cpu quota, is at best a statistical gamble to ameliorate current shortcomings in Go's userspace scheduler.

Some background on uber-go/automaxprocs#13 (changing from ceil(quota) to floor(quota)):

over-provisioning seemed reasonable at first when the guess was "if we provision fractional cores, use them"
but later on we ended up needing fractionals to add margin for other supporting processes injected into your container (e.g. by Mesos Aurora's Thermos executor)

I'll reprise (copied with some edits) my description from that issue here for easy reading:

as far as the Go scheduler is concerned, there's no such thing as a fractional CPU

so let's say you have your quota set to N + p for some integer N and some 0.0 < p < 1.0

the only safe assumption then is that you're using that p value as a hedge for something like "systems effects" or "c libraries"

in that latter case, what you really might want is to be able give maxprocs some value K of CPUs stolen for some parasite like a C library or sidecar; but this will always need to be application-config specific

jcorbin · 2019-08-23T16:37:47Z

Noting: #19378 (comment) explores some GC-CFS relationship

jcorbin · 2019-08-23T18:28:20Z

For comparative purposes, Oracle blog post about Java adding similar support ( especially for GC threads )

ianlancetaylor · 2019-08-23T20:09:58Z

CC @aclements @mknyszek

(I'm not sure this has to be a proposal at all. This is more like a bug report. See https://golang.org/s/proposal.)

ianlancetaylor · 2019-09-03T20:52:09Z

Changing this from a proposal into a feature request for the runtime package.

Perhaps due to etcd failing due to mismatch between the CPU limits and the apparent available CPUs according to GOMAXPROCS See golang/go#33803

GOMAXPROCS subsumes NumCPU for the purpose of sizing semaphores. If users set CPU affinity, then GOMAXPROCS will reflect that. If users only set GOMAXPROCS, then NumCPU would be inaccurate. Additionally, there are plans to make GOMAXPROCS aware of CPU quotas (golang/go#33803). Users are still advised to set CPU affinity instead of relying on GOMAXPROCS to limit CPU usage, because Staticcheck shells out to the underlying build system, which together with Staticcheck would be able to use more CPU than intended if limited by just GOMAXPROCS.

see golang/go#33803 for more info

prattmic · 2025-04-07T05:00:10Z

I have filed a concrete proposal for how we can do this at #73193. Please take a look.

gopherbot added this to the Proposal milestone Aug 23, 2019

gopherbot added the Proposal label Aug 23, 2019

This comment has been minimized.

Sign in to view

ianlancetaylor changed the title ~~proposal: make GOMAXPROCS cfs-aware on GOOS=linux~~ runtime: make GOMAXPROCS cfs-aware on GOOS=linux Sep 3, 2019

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Sep 3, 2019

ianlancetaylor modified the milestones: Proposal, Go1.14 Sep 3, 2019

This comment was marked as outdated.

Sign in to view

griesemer mentioned this issue Sep 3, 2019

proposal: review meeting minutes #33502

Open

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

DavadDi mentioned this issue Dec 6, 2019

The variable GOMAXPROCS does not work properly in Container environment prometheus/prometheus#6413

Closed

jcorbin mentioned this issue Jan 7, 2020

Support cgroups v2 and its unified hierarchy uber-go/automaxprocs#21

Closed

alphastorm mentioned this issue May 25, 2020

Automatically set GOMAXPROCS to match Linux container CPU quota keep-network/keep-core#1827

Open

julz mentioned this issue Aug 5, 2020

Automatically set GOMAXPROCS in sharedmain knative/pkg#1583

Closed

drwatsno mentioned this issue Aug 31, 2020

Automatically set GOMAXPROCS according to linux container cpu quota ory/hydra#2033

Closed

islishude mentioned this issue Nov 15, 2020

容器资源可见性问题与 GOMAXPROCS 配置 islishude/blog#216

Open

uhthomas mentioned this issue Mar 5, 2021

automaxprocs? buildbarn/bb-storage#111

Closed

ivan-valkov mentioned this issue Aug 6, 2021

Auto set GOMAXPROCS based on container limits in executor and operator SeldonIO/seldon-core#3471

Merged

bt90 mentioned this issue Jan 19, 2024

Set GOMAXPROCS quota aware in Linux containers syncthing/syncthing#9357

Closed

seankhliao mentioned this issue Feb 11, 2024

runtime: make the proportion of CPU the GC uses based on actual available CPU time and not GOMAXPROCS #59715

Open

serathius mentioned this issue Feb 15, 2024

feat(helm chart): Make env vars configurable and auto configure go runtime kubernetes-sigs/metrics-server#1412

Closed

jtackaberry mentioned this issue Jun 13, 2024

Investigate autosetting GOMAXPROCS and GOMEMLIMIT env vars rqlite/helm-charts#27

Open

f9n mentioned this issue Jul 22, 2024

Alloy should be cgroup aware grafana/alloy#1340

Open

jharveyb mentioned this issue Sep 16, 2024

fn+proof: use GOMAXPROCS for worker pool size lightninglabs/taproot-assets#1123

Merged

DarVoid mentioned this issue Sep 26, 2024

ref: GOMAXPROCS=1 antonputra/tutorials#288

Merged

patrickdappollonio mentioned this issue Oct 24, 2024

Add support for appropriate GOMAXPROCS that's aware of Kubernetes limits. patrickdappollonio/http-server#136

Merged

uhthomas mentioned this issue Nov 5, 2024

Set number of workers to allocated cgroup CPUs? envoyproxy/envoy#36988

Open

func25 mentioned this issue Jan 9, 2025

Difference between vm_available_cpu_cores and process_cpu_cores_available of VM component ? (containers) VictoriaMetrics/VictoriaMetrics#7988

Closed

3 tasks

itspooya mentioned this issue Feb 26, 2025

chore: use Linux container CPU quota NVIDIA/gpu-operator#1297

Closed

itspooya added a commit to itspooya/autoscaling that referenced this issue Mar 4, 2025

feat: automatically set gomaxprocs for cfs

ef84200

see golang/go#33803 for more info

itspooya added a commit to itspooya/autoscaling that referenced this issue Mar 4, 2025

feat: automatically set gomaxprocs for cfs

2af1452

see golang/go#33803 for more info

itspooya mentioned this issue Mar 4, 2025

feat: automatically set gomaxprocs for cfs neondatabase/autoscaling#1299

Closed

itspooya added a commit to itspooya/autoscaling that referenced this issue Mar 10, 2025

feat: automatically set gomaxprocs for cfs

1fe9736

see golang/go#33803 for more info

jeanfabrice mentioned this issue Mar 18, 2025

[Meta] Clarification and Automation of CPU Limit Settings for Elastic Agent / Beat in Containerized environment elastic/elastic-agent#7455

Open

2 tasks

hanlin-openai mentioned this issue Mar 26, 2025

Plumb resources into workload and add go env Azure/peerd#90

Merged

prattmic mentioned this issue Apr 7, 2025

proposal: runtime: CPU limit-aware GOMAXPROCS default #73193

Open

mknyszek assigned prattmic and unassigned mknyszek May 8, 2025

mknyszek moved this from Todo to In Progress in Go Compiler / Runtime May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

jcorbin commented Aug 23, 2019 •

edited

Loading

jcorbin commented Aug 23, 2019

jcorbin commented Aug 23, 2019

jcorbin commented Aug 23, 2019

This comment has been minimized.

This comment has been minimized.

ianlancetaylor commented Aug 23, 2019

ianlancetaylor commented Sep 3, 2019

This comment was marked as outdated.

prattmic commented Apr 7, 2025

runtime: make GOMAXPROCS cfs-aware on GOOS=linux #33803

runtime: make GOMAXPROCS cfs-aware on GOOS=linux #33803

Comments

jcorbin commented Aug 23, 2019 • edited Loading

Problem

Solution

NOTEs

jcorbin commented Aug 23, 2019

jcorbin commented Aug 23, 2019

jcorbin commented Aug 23, 2019

This comment has been minimized.

This comment has been minimized.

ianlancetaylor commented Aug 23, 2019

ianlancetaylor commented Sep 3, 2019

This comment was marked as outdated.

prattmic commented Apr 7, 2025

runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

runtime: make `GOMAXPROCS` cfs-aware on `GOOS=linux` #33803

jcorbin commented Aug 23, 2019 •

edited

Loading