storage: Add load-based split telemetry #39192

andy-kimball · 2019-07-31T16:41:09Z

Add "kv.split.load" telemetry counter. It is incremented each time a load-
based split occurs.

Release note: None

cockroach-teamcity · 2019-07-31T16:41:17Z

This change is

knz

do you reckon this could be tested?

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @nvanbenschoten)

andy-kimball

I'm open to suggestions on how to test this in reasonable, automated way. It's one of the things I was hoping to get some feedback about from this group, since I didn't see much precedent for it in the core code to follow.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @nvanbenschoten)

nvanbenschoten

I searched for an end-to-end load-based split in our unit tests but couldn't find one. That's a little surprising to me, but I guess our feeling at the time was that the smaller unit tests + the suite of splits/load roachtests provided sufficient coverage.

In terms of testing these telemetry hooks, we seem to be pretty inconsistent. We have a number of tests in pkg/kv that get a lot of utility out of peering in at the TxnCoordSender's metrics (see checkTxnMetrics). In other places we haven't found them to be as useful. I think in general we've taken the stance that we'll test metrics if doing so helps test the larger component, but we don't test metrics just to make sure that they're hooked up correctly. I personally don't have a strong feeling either way, other than that I would be concerned about a strict rule here causing us to avoid adding telemetry in hard-to-reach places in the first place.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, and @danhhz)

pkg/storage/split_queue.go, line 32 at r1 (raw file):

// loadBasedSplitCounter is incremented each time a load-based split occurs.
var loadBasedSplitCounter = telemetry.GetCounterOnce("kv.split.load")

What's your preference for static variables like this vs. adding the counter as a field in splitQueue like we do in kv.TxnMetrics (among other places)?

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @nvanbenschoten)

pkg/storage/split_queue.go, line 32 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's your preference for static variables like this vs. adding the counter as a field in splitQueue like we do in kv.TxnMetrics (among other places)?

Well, it seems like there's a general pattern in SQL-land to use global counters. This is a better idea when the counters are used by objects that are created and destroyed often. There's no reason to add overhead to creation and extra memory to objects. However, I assume that splitQueue isn't created often, so that isn't a concern here. So I'm fine using it as a field if that's what we prefer in core. I'll just point out that over time, we might end up getting a mishmash of the two patterns, because we'll probably add telemetry to core objects where using the field pattern isn't appropriate.

andy-kimball

Yeah, I tried putting a panic where I call out to telemetry (in split_queue.go), and then ran all the tests in pkg/storage, and never hit the panic. So it seems that load-based splitting has 0 tests in the storage package, which isn't a great situation. I think we've got to do a better job with building unit-testable code, since using roachtests for this is like using a sledgehammer to drive a finishing nail.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @nvanbenschoten)

andy-kimball

Given that this code is only tested via roachtest, I manually verified that the counter is being correctly incremented.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @nvanbenschoten)

pkg/storage/split_queue.go, line 32 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Well, it seems like there's a general pattern in SQL-land to use global counters. This is a better idea when the counters are used by objects that are created and destroyed often. There's no reason to add overhead to creation and extra memory to objects. However, I assume that splitQueue isn't created often, so that isn't a concern here. So I'm fine using it as a field if that's what we prefer in core. I'll just point out that over time, we might end up getting a mishmash of the two patterns, because we'll probably add telemetry to core objects where using the field pattern isn't appropriate.

I changed the code to use a field for the counter rather than a global. PTAL.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @andy-kimball, @danhhz, @knz, and @nvanbenschoten)

pkg/storage/split_queue.go, line 72 at r2 (raw file):

		db:             db,
		purgChan:       purgChan,
		loadBasedCount: telemetry.GetCounterOnce("kv.split.load"),

I think we want GetCounter, not GetCounterOnce.

andy-kimball · 2019-08-26T22:54:26Z

pkg/storage/split_queue.go, line 72 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we want GetCounter, not GetCounterOnce.

Done. I didn't appreciate the difference.

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @nvanbenschoten)

pkg/storage/split_queue.go, line 72 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Done. I didn't appreciate the difference.

Did you push?

Add "kv.split.load" telemetry counter. It is incremented each time a load- based split occurs. Release note: None

andy-kimball

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, @knz, and @nvanbenschoten)

pkg/storage/split_queue.go, line 72 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you push?

Whoops, got distracted by something else and neglected to push.

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @knz)

andy-kimball · 2019-08-27T14:56:03Z

bors r+

39192: storage: Add load-based split telemetry r=andy-kimball a=andy-kimball Add "kv.split.load" telemetry counter. It is incremented each time a load- based split occurs. Release note: None Co-authored-by: Andrew Kimball <[email protected]>

craig · 2019-08-27T15:23:26Z

Build succeeded

GitHub CI (Cockroach)

andy-kimball requested review from danhhz, ajwerner, nvanbenschoten and a team July 31, 2019 16:41

andy-kimball requested a review from knz July 31, 2019 16:42

knz reviewed Jul 31, 2019

View reviewed changes

andy-kimball commented Jul 31, 2019

View reviewed changes

nvanbenschoten reviewed Aug 1, 2019

View reviewed changes

andy-kimball commented Aug 2, 2019

View reviewed changes

andy-kimball force-pushed the telemetry branch from a510cfe to d880e0a Compare August 12, 2019 16:00

andy-kimball commented Aug 26, 2019

View reviewed changes

nvanbenschoten reviewed Aug 26, 2019

View reviewed changes

storage: Add load-based split telemetry

92d968f

Add "kv.split.load" telemetry counter. It is incremented each time a load- based split occurs. Release note: None

andy-kimball force-pushed the telemetry branch from d880e0a to 92d968f Compare August 27, 2019 00:30

andy-kimball commented Aug 27, 2019

View reviewed changes

nvanbenschoten approved these changes Aug 27, 2019

View reviewed changes

knz approved these changes Aug 27, 2019

View reviewed changes

craig bot merged commit 92d968f into cockroachdb:master Aug 27, 2019

andy-kimball deleted the telemetry branch August 28, 2019 01:23

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

storage: Add load-based split telemetry #39192

storage: Add load-based split telemetry #39192

Uh oh!

andy-kimball commented Jul 31, 2019

Uh oh!

cockroach-teamcity commented Jul 31, 2019

Uh oh!

knz left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

nvanbenschoten left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

nvanbenschoten left a comment

Uh oh!

andy-kimball commented Aug 26, 2019

Uh oh!

nvanbenschoten left a comment

Uh oh!

andy-kimball left a comment

Uh oh!

nvanbenschoten left a comment

Uh oh!

andy-kimball commented Aug 27, 2019

Uh oh!

craig bot commented Aug 27, 2019

Uh oh!

Uh oh!

storage: Add load-based split telemetry #39192

storage: Add load-based split telemetry #39192

Uh oh!

Conversation

andy-kimball commented Jul 31, 2019

Uh oh!

cockroach-teamcity commented Jul 31, 2019

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

nvanbenschoten left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

nvanbenschoten left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball commented Aug 26, 2019

Uh oh!

nvanbenschoten left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball left a comment

Choose a reason for hiding this comment

Uh oh!

nvanbenschoten left a comment

Choose a reason for hiding this comment

Uh oh!

andy-kimball commented Aug 27, 2019

Uh oh!

craig bot commented Aug 27, 2019

Build succeeded

Uh oh!

Uh oh!