workload/tpcc: pre-compute random data for initial data load strings #38208

danhhz · 2019-06-17T14:54:26Z

This is explictly allowed by the spec only for initial data load in
4.3.2.1:

For the purpose of populating the initial database only, random numbers
can be generated by selecting entries in sequence from a set of at least
10,000 pregenerated random numbers. This technique cannot be used for
the field O_OL_CNT.

name                             old time/op    new time/op    delta
InitialData/tpcc/warehouses=1-8     368ms ± 9%     229ms ± 1%  -37.77%  (p=0.000 n=15+15)

name                             old speed      new speed      delta
InitialData/tpcc/warehouses=1-8   299MB/s ± 9%   481MB/s ± 1%  +60.50%  (p=0.000 n=15+15)

name                             old alloc/op   new alloc/op   delta
InitialData/tpcc/warehouses=1-8     193kB ± 0%     126kB ± 0%  -34.80%  (p=0.000 n=14+14)

name                             old allocs/op  new allocs/op  delta
InitialData/tpcc/warehouses=1-8       592 ± 0%       458 ± 0%  -22.59%  (p=0.000 n=15+15)

Release note: None

cockroach-teamcity · 2019-06-17T14:54:38Z

This change is

tbg

I'm missing some context here, but it from the review it seems that the resulting code is kind of vague about what's precomputed and what isn't. It'd be nice if that were somehow more obvious.

Reviewed 9 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)

pkg/workload/bench_test.go, line 54 at r1 (raw file):

		// PrecomputedRand. This makes the benchmark results an upper bound on
		// performance/allocations. TPCC, for example, special cases the default
		// seed to do some initialization only once which means it will be faster

Without the storing of the precomputation for seed=0, what would be the real-world effects? Wouldn't we precompute this only once per node per import? Trying to understand whether it's worth the cognitive complexity to make one seed special.

pkg/workload/tpcc/random.go, line 89 at r1 (raw file):

// randAString generates a random alphanumeric string of length between min and
// max inclusive. See 4.3.2.2.
func randAString(rng *tpccRand, a *bufalloc.ByteAllocator, min, max int) []byte {

Should these methods make it clear (in their name) that they're only allowed for initial data?

pkg/workload/workloadimpl/precomputedrand_test.go, line 81 at r1 (raw file):

	var randOffset int

	b.Run(`len=2`, func(b *testing.B) {

Nit: fmt.Sprintf("len=%d", shortLen) at which point why not unify the three subbenchmarks and range []int{shortLen, mediumLen, longLen} { ... }

danhhz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

pkg/workload/bench_test.go, line 54 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Without the storing of the precomputation for seed=0, what would be the real-world effects? Wouldn't we precompute this only once per node per import? Trying to understand whether it's worth the cognitive complexity to make one seed special.

good point. tpcc generates a 10KiB string. i was thinking ahead to tpch which has a 300MiB one, but init once may not even be how we want to do that so I'll punt this till we hit it

pkg/workload/tpcc/random.go, line 89 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Should these methods make it clear (in their name) that they're only allowed for initial data?

Done

pkg/workload/workloadimpl/precomputedrand_test.go, line 81 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Nit: fmt.Sprintf("len=%d", shortLen) at which point why not unify the three subbenchmarks and range []int{shortLen, mediumLen, longLen} { ... }

Done

tbg

Reviewed 6 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained

danhhz · 2019-06-28T22:17:42Z

Putting the offsets in the sync.Pool objects wasn't deterministic (and the test caught it, yay). Should be deterministic again. It's a tiny bit slower than the non-deterministic version but still much faster than what's in master. For some reason I also seem to have saved some allocs

This is explictly allowed by the spec only for initial data load in 4.3.2.1: For the purpose of populating the initial database only, random numbers can be generated by selecting entries in sequence from a set of at least 10,000 pregenerated random numbers. This technique cannot be used for the field O_OL_CNT. name old time/op new time/op delta InitialData/tpcc/warehouses=1-8 368ms ± 9% 229ms ± 1% -37.77% (p=0.000 n=15+15) name old speed new speed delta InitialData/tpcc/warehouses=1-8 299MB/s ± 9% 481MB/s ± 1% +60.50% (p=0.000 n=15+15) name old alloc/op new alloc/op delta InitialData/tpcc/warehouses=1-8 193kB ± 0% 126kB ± 0% -34.80% (p=0.000 n=14+14) name old allocs/op new allocs/op delta InitialData/tpcc/warehouses=1-8 592 ± 0% 458 ± 0% -22.59% (p=0.000 n=15+15) Release note: None

danhhz · 2019-07-08T19:23:05Z

Thanks for the review!

bors r=tbg

38208: workload/tpcc: pre-compute random data for initial data load strings r=tbg a=danhhz This is explictly allowed by the spec only for initial data load in 4.3.2.1: For the purpose of populating the initial database only, random numbers can be generated by selecting entries in sequence from a set of at least 10,000 pregenerated random numbers. This technique cannot be used for the field O_OL_CNT. name old time/op new time/op delta InitialData/tpcc/warehouses=1-8 368ms ± 9% 229ms ± 1% -37.77% (p=0.000 n=15+15) name old speed new speed delta InitialData/tpcc/warehouses=1-8 299MB/s ± 9% 481MB/s ± 1% +60.50% (p=0.000 n=15+15) name old alloc/op new alloc/op delta InitialData/tpcc/warehouses=1-8 193kB ± 0% 126kB ± 0% -34.80% (p=0.000 n=14+14) name old allocs/op new allocs/op delta InitialData/tpcc/warehouses=1-8 592 ± 0% 458 ± 0% -22.59% (p=0.000 n=15+15) Release note: None Co-authored-by: Daniel Harrison <[email protected]>

craig · 2019-07-08T20:01:06Z

Build succeeded

GitHub CI (Cockroach)

danhhz requested a review from tbg June 17, 2019 14:54

tbg reviewed Jun 18, 2019

View reviewed changes

danhhz force-pushed the workload_rand_precomputed branch 3 times, most recently from c8a405a to 5a3129a Compare June 25, 2019 20:06

danhhz commented Jun 25, 2019

View reviewed changes

tbg approved these changes Jun 25, 2019

View reviewed changes

danhhz force-pushed the workload_rand_precomputed branch 2 times, most recently from fba9589 to 56df27f Compare June 28, 2019 22:11

danhhz requested a review from a team June 28, 2019 22:11

danhhz force-pushed the workload_rand_precomputed branch from 56df27f to c126efe Compare June 28, 2019 22:36

danhhz force-pushed the workload_rand_precomputed branch from c126efe to 18b4d44 Compare July 8, 2019 19:01

craig bot merged commit 18b4d44 into cockroachdb:master Jul 8, 2019

danhhz deleted the workload_rand_precomputed branch July 9, 2019 14:52

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

workload/tpcc: pre-compute random data for initial data load strings #38208

workload/tpcc: pre-compute random data for initial data load strings #38208

Uh oh!

danhhz commented Jun 17, 2019 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jun 17, 2019

Uh oh!

tbg left a comment

Uh oh!

danhhz left a comment

Uh oh!

tbg left a comment

Uh oh!

danhhz commented Jun 28, 2019

Uh oh!

danhhz commented Jul 8, 2019

Uh oh!

craig bot commented Jul 8, 2019

Uh oh!

Uh oh!

workload/tpcc: pre-compute random data for initial data load strings #38208

workload/tpcc: pre-compute random data for initial data load strings #38208

Uh oh!

Conversation

danhhz commented Jun 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jun 17, 2019

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

danhhz commented Jun 28, 2019

Uh oh!

danhhz commented Jul 8, 2019

Uh oh!

craig bot commented Jul 8, 2019

Build succeeded

Uh oh!

Uh oh!

danhhz commented Jun 17, 2019 •

edited

Loading