movr: Add stats collection to movr workload run #41138

rohany · 2019-09-26T20:10:12Z

This PR adds tracking stats for each kind of query in the movr workload
so that output is displayed from cockroach workload run. Additionally,
this refactors the movr workload to define the work as functions on a
worker struct. This hopefully will avoid a common gotcha of having
different workers sharing the same not threadsafe histograms object.

Release justification: low risk nice to have feature

Release note: None

cockroach-teamcity · 2019-09-26T20:10:20Z

This change is

rohany · 2019-09-26T20:10:26Z

cc @jseldess , no need to open an issue

danhhz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @rohany)

pkg/workload/movr/movr.go, line 553 at r1 (raw file):

		err := work()
		elapsed := timeutil.Since(start)
		hists.Get(key).Record(elapsed)

we definitely only want to update when err is nil. we've had issues in the past with it being misleading to mix successful and failing queries in the same histogram

if you'd like to measure the errors as well, i'd make them separate buckets (either one big errors bucket or something like key + "-error", though probably the former to limit histogram explosion)

pkg/workload/movr/movr.go, line 626 at r1 (raw file):

	}

	hists := reg.GetHandle()

the handle returned by this is not threadsafe, you need one per worker. I think this happens to work now since there appears to be one worker, but let's avoid leaving this gotcha around in case someone goes to add more workers later

pkg/workload/movr/movr.go, line 666 at r1 (raw file):

			return err
		} else if rng.Float64() < 0.1 {
			// Apply a promo code to an account.

aren't these more useful to track at the level of a logical "movr api call"? so there'd be one for "apply promo code" instead of breaking it down for each db call in it. see how tpcc works for what i'm suggestion

rohany

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

pkg/workload/movr/movr.go, line 553 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

we definitely only want to update when err is nil. we've had issues in the past with it being misleading to mix successful and failing queries in the same histogram

if you'd like to measure the errors as well, i'd make them separate buckets (either one big errors bucket or something like key + "-error", though probably the former to limit histogram explosion)

Ok, that makes sense.

pkg/workload/movr/movr.go, line 626 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

the handle returned by this is not threadsafe, you need one per worker. I think this happens to work now since there appears to be one worker, but let's avoid leaving this gotcha around in case someone goes to add more workers later

Yeah, i saw that. There is only one worker right now, so this is OK. However, I'm not sure how to change this to avoid a gotcha? Does leaving a comment denoting that this is the case the correct thing to do?

pkg/workload/movr/movr.go, line 666 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

aren't these more useful to track at the level of a logical "movr api call"? so there'd be one for "apply promo code" instead of breaking it down for each db call in it. see how tpcc works for what i'm suggestion

I can condense some of these queries into one timed execution, but I wanted to separate the getRandom* from the other queries, because these are not part of the original movr application. They were added as a utility for me to easily generate random values, while the movr app we have kind of makes a local in-memory copy of the db and samples from it. So i didn't want to include those queries as part of the timing of a particular API call in order to avoid having times that differ a decent amount from the published movr app.

danhhz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rohany)

pkg/workload/movr/movr.go, line 626 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Yeah, i saw that. There is only one worker right now, so this is OK. However, I'm not sure how to change this to avoid a gotcha? Does leaving a comment denoting that this is the case the correct thing to do?

I think we need to bite the bullet on a small refactor of this code. Like I said on slack, this should all probably move to be closer to what tpcc is doing. I'm wary of making this code more complex and brittle and then calling it "low risk"

pkg/workload/movr/movr.go, line 666 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I can condense some of these queries into one timed execution, but I wanted to separate the getRandom* from the other queries, because these are not part of the original movr application. They were added as a utility for me to easily generate random values, while the movr app we have kind of makes a local in-memory copy of the db and samples from it. So i didn't want to include those queries as part of the timing of a particular API call in order to avoid having times that differ a decent amount from the published movr app.

Hmm, i'm not particularly concerned about this matching the published movr app. Should I be? I'd rather optimize for this making sense to people kicking the tires on cockroachdb, which is likely to be the majority use of workload run movr

rohany

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

pkg/workload/movr/movr.go, line 626 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

I think we need to bite the bullet on a small refactor of this code. Like I said on slack, this should all probably move to be closer to what tpcc is doing. I'm wary of making this code more complex and brittle and then calling it "low risk"

Ok, i did the refactor. it does feel better now

pkg/workload/movr/movr.go, line 666 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

Hmm, i'm not particularly concerned about this matching the published movr app. Should I be? I'd rather optimize for this making sense to people kicking the tires on cockroachdb, which is likely to be the majority use of workload run movr

I thought about it more and I agree -- if you want to see the exact same thing as the movr app, just run the docker image yourself. Otherwise, cockroach workload run movr doesn't need to be exact.

danhhz

one last plea for not leaving this as a gotcha, but feel free to merge even if you don't

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @rohany)

pkg/workload/movr/movr.go, line 758 at r2 (raw file):

	}

	// Hists is not threadsafe! If this workload expands to returning multiple workers,

i think doing this is trivial now. if you make a new type struct worker that has a db *gosql.DB and a hists *histograms.Histograms, you can move all the work fns and movrQuerySimulation to be methods on that struct

if you really don't feel like doing this now, then lets move this comment to be above the ql.WorkerFns = append(ql.WorkerFns, movrQuerySimulation) line, which is much more likely to be seen by someone making the change you mention than it would up here

This PR adds tracking stats for each kind of query in the movr workload so that output is displayed from cockroach workload run. Additionally, this refactors the movr workload to define the work as functions on a worker struct. This hopefully will avoid a common gotcha of having different workers sharing the same not threadsafe histograms object. Release justification: low risk nice to have feature Release note: None

rohany

Thanks for your persistence -- it feels alot better now. Can you take another quick look?

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz and @rohany)

danhhz

\o/

Thanks for you patience on this, I feel good that we picked the cleanup off now.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @rohany)

rohany · 2019-09-30T15:11:54Z

bors r=danhhz

40493: sql: Display inherited constraints in SHOW PARTITIONS r=andreimatei a=rohany SHOW PARTITIONS now displays the inherited zone configuration of the partitions in a separate column. To accomplish this, the crdb_internal.zones table now holds on to the inherited constraints of each zone in a separate column. Additionally, the crdb_internal.partitions table holds on to the zone_id and subzone_id of the zone configuration the partition refers to. These id's correspond to the zone configuration at the lowest point in that partitions "inheritance chain". Release justification: Adds a low risk, good to have UX feature. Fixes #40349. Release note (sql change): * SHOW PARTITIONS now displays inherited zone configurations. * Adds the zone_id, subzone_id columns to crdb_internal.partitions, which form a link to the corresponding zone config in crdb_internal.zones which apply to the partitions. * Rename the config_yaml, config_sql and config_proto columns in crdb_internal.zones to raw_config_yaml, raw_config_sql, raw_config_proto. * Add the columns full_config_sql and full_config_yaml to the crdb_internal.zones table which display the full/inherited zone configuration. 41138: movr: Add stats collection to movr workload run r=danhhz a=rohany This PR adds tracking stats for each kind of query in the movr workload so that output is displayed from cockroach workload run. Additionally, this refactors the movr workload to define the work as functions on a worker struct. This hopefully will avoid a common gotcha of having different workers sharing the same not threadsafe histograms object. Release justification: low risk nice to have feature Release note: None 41196: store,bulk: log when delaying AddSSTable, collect + log more timings in bulk-ingest r=dt a=dt storage: log when AddSSTable requests are delayed If the rate-limiting and back-pressure mechanisms kick in, they can dramatically delay requests in some cases. However there is currently it can be unclear that this is happening and the system may simply appear slow. Logging when requests are delayed by more than a second should help identify when this is the cause of slowness. Release note: none. Release justification: low-risk (logging only) change that could significantly help in diagnosing 'stuck' jobs based on logs (which often all we have to go on). bulk: track and log more timings This tracks and logs time spent in the various stages of ingestion - sorting, splitting and flushing. This helps when trying to diagnose why a job is 'slow' or 'stuck'. Release note: none. Release justification: low-risk (logging only) changes that improve ability to diagnose problems. Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: David Taylor <[email protected]>

craig · 2019-09-30T16:02:37Z

Build succeeded

GitHub CI (Cockroach)

rohany requested a review from danhhz September 26, 2019 20:10

danhhz reviewed Sep 26, 2019

View reviewed changes

rohany commented Sep 26, 2019

View reviewed changes

danhhz reviewed Sep 26, 2019

View reviewed changes

rohany force-pushed the movr-workload-stats branch from 12236db to 682765c Compare September 26, 2019 22:15

rohany commented Sep 26, 2019

View reviewed changes

danhhz reviewed Sep 27, 2019

View reviewed changes

rohany force-pushed the movr-workload-stats branch from 682765c to 460f9e7 Compare September 30, 2019 14:36

rohany commented Sep 30, 2019

View reviewed changes

danhhz reviewed Sep 30, 2019

View reviewed changes

craig bot merged commit 460f9e7 into cockroachdb:master Sep 30, 2019

knz mentioned this pull request Oct 2, 2019

release: v19.2.0-beta.20190930 #41128

Closed

18 tasks

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

movr: Add stats collection to movr workload run #41138

movr: Add stats collection to movr workload run #41138

Uh oh!

rohany commented Sep 26, 2019 •

edited

Loading

Uh oh!

cockroach-teamcity commented Sep 26, 2019

Uh oh!

rohany commented Sep 26, 2019

Uh oh!

danhhz left a comment

Uh oh!

rohany left a comment

Uh oh!

danhhz left a comment

Uh oh!

rohany left a comment

Uh oh!

danhhz left a comment

Uh oh!

rohany left a comment

Uh oh!

danhhz left a comment

Uh oh!

rohany commented Sep 30, 2019

Uh oh!

craig bot commented Sep 30, 2019

Uh oh!

Uh oh!

movr: Add stats collection to movr workload run #41138

movr: Add stats collection to movr workload run #41138

Uh oh!

Conversation

rohany commented Sep 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Sep 26, 2019

Uh oh!

rohany commented Sep 26, 2019

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

rohany left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

rohany left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

rohany left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

rohany commented Sep 30, 2019

Uh oh!

craig bot commented Sep 30, 2019

Build succeeded

Uh oh!

Uh oh!

rohany commented Sep 26, 2019 •

edited

Loading