Skip to content

Re-introduce CAS optimizations for high concurrency cases. #1760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bwplotka opened this issue Mar 4, 2025 · 3 comments
Open

Re-introduce CAS optimizations for high concurrency cases. #1760

bwplotka opened this issue Mar 4, 2025 · 3 comments

Comments

@bwplotka
Copy link
Member

bwplotka commented Mar 4, 2025

We had to revert #1661 due to major performance issues on Add/Inc/Observe methods for cumulatives #1748

This issue acceptance criteria is to reintroduce its benefits without compromising low concurrency cases.

Help wanted!

@zhengkezhou1
Copy link

Hi @bwplotka, I reviewed the relevant content, and the main issue appears to be that, in most cases, the interval between two operations is unnecessary. The solution is to either reduce these intervals or eliminate them altogether. I will work on addressing this.

@zhengkezhou1
Copy link

According to the feedback in issue #1748 and the implementation in PR #1661, the initial wait time has been adjusted from 10ms to 10ns, and the maximum wait time from 320ms to 50ms.

goos: linux
goarch: amd64
pkg: github.com/prometheus/client_golang/prometheus
cpu: Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
                                          │     base      │                 new                  │
                                          │    sec/op     │    sec/op      vs base               │
AtomicUpdateFloatPureFunc/goroutines=0-8    85.01n ±   8%   10.71n ±   5%  -87.40% (p=0.002 n=6)
AtomicUpdateFloatPureFunc/goroutines=1-8    95.63n ±  20%   10.79n ±   4%  -88.72% (p=0.002 n=6)
AtomicUpdateFloatPureFunc/goroutines=2-8    96.18n ±   9%   11.04n ±   3%  -88.52% (p=0.002 n=6)
AtomicUpdateFloatPureFunc/goroutines=4-8    95.58n ±   2%   11.55n ±  10%  -87.91% (p=0.002 n=6)
AtomicUpdateFloatPureFunc/goroutines=8-8    90.57n ±  22%   12.58n ±   5%  -86.10% (p=0.002 n=6)
AtomicUpdateFloatPureFunc/goroutines=16-8   96.97n ±   3%   28.92n ±  41%  -70.17% (p=0.002 n=6)
AtomicUpdateWithSimpleJob/goroutines=0-8    11.91m ±   5%   11.68m ±   6%        ~ (p=0.485 n=6)
AtomicUpdateWithSimpleJob/goroutines=1-8    12.35m ±   4%   11.78m ±   5%   -4.66% (p=0.015 n=6)
AtomicUpdateWithSimpleJob/goroutines=2-8    12.08m ±   7%   11.78m ±   3%        ~ (p=0.132 n=6)
AtomicUpdateWithSimpleJob/goroutines=4-8    11.23m ±   3%   11.45m ±   7%        ~ (p=0.485 n=6)
AtomicUpdateWithSimpleJob/goroutines=8-8    16.07m ± 431%   19.85m ± 235%        ~ (p=0.485 n=6)
AtomicUpdateWithSimpleJob/goroutines=16-8   11.57m ±  17%   14.86m ± 105%        ~ (p=0.818 n=6)
geomean                                     34.06µ          13.25µ         -61.09%

SimpleJob simulates a basic daily task.

func simpleJob() {
	data := [10_000_000]int{}
	for i := 0; i < 10_000_000; i++ {
		data[i] = i
	}
}

@mattrobenolt
Copy link

mattrobenolt commented Apr 2, 2025

I would like to submit an alternative I've been playing with for my own metrics library specifically around concurrently updating float values.

https://github.com/mattrobenolt/go-metrics/blob/3b073cc/internal/atomicx/atomicx.go#L37-L94

I have very extensively tested and benchmarked this Sum struct that wraps 3 atomic.Uint64s.

https://github.com/mattrobenolt/go-metrics/blob/main/internal/atomicx/benchmarks.txt

Results are basically twice as fast while under heavy contention.

I strongly disagree with an implementation that leverages a sleep within the CAS loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants