Skip to content

workload/tpcc: pre-compute random data for initial data load strings #38208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Jun 17, 2019

This is explictly allowed by the spec only for initial data load in
4.3.2.1:

For the purpose of populating the initial database only, random numbers
can be generated by selecting entries in sequence from a set of at least
10,000 pregenerated random numbers. This technique cannot be used for
the field O_OL_CNT.

name                             old time/op    new time/op    delta
InitialData/tpcc/warehouses=1-8     368ms ± 9%     229ms ± 1%  -37.77%  (p=0.000 n=15+15)

name                             old speed      new speed      delta
InitialData/tpcc/warehouses=1-8   299MB/s ± 9%   481MB/s ± 1%  +60.50%  (p=0.000 n=15+15)

name                             old alloc/op   new alloc/op   delta
InitialData/tpcc/warehouses=1-8     193kB ± 0%     126kB ± 0%  -34.80%  (p=0.000 n=14+14)

name                             old allocs/op  new allocs/op  delta
InitialData/tpcc/warehouses=1-8       592 ± 0%       458 ± 0%  -22.59%  (p=0.000 n=15+15)

Release note: None

@danhhz danhhz requested a review from tbg June 17, 2019 14:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing some context here, but it from the review it seems that the resulting code is kind of vague about what's precomputed and what isn't. It'd be nice if that were somehow more obvious.

Reviewed 9 of 9 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/workload/bench_test.go, line 54 at r1 (raw file):

		// PrecomputedRand. This makes the benchmark results an upper bound on
		// performance/allocations. TPCC, for example, special cases the default
		// seed to do some initialization only once which means it will be faster

Without the storing of the precomputation for seed=0, what would be the real-world effects? Wouldn't we precompute this only once per node per import? Trying to understand whether it's worth the cognitive complexity to make one seed special.


pkg/workload/tpcc/random.go, line 89 at r1 (raw file):

// randAString generates a random alphanumeric string of length between min and
// max inclusive. See 4.3.2.2.
func randAString(rng *tpccRand, a *bufalloc.ByteAllocator, min, max int) []byte {

Should these methods make it clear (in their name) that they're only allowed for initial data?


pkg/workload/workloadimpl/precomputedrand_test.go, line 81 at r1 (raw file):

	var randOffset int

	b.Run(`len=2`, func(b *testing.B) {

Nit: fmt.Sprintf("len=%d", shortLen) at which point why not unify the three subbenchmarks and range []int{shortLen, mediumLen, longLen} { ... }

@danhhz danhhz force-pushed the workload_rand_precomputed branch 3 times, most recently from c8a405a to 5a3129a Compare June 25, 2019 20:06
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/workload/bench_test.go, line 54 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Without the storing of the precomputation for seed=0, what would be the real-world effects? Wouldn't we precompute this only once per node per import? Trying to understand whether it's worth the cognitive complexity to make one seed special.

good point. tpcc generates a 10KiB string. i was thinking ahead to tpch which has a 300MiB one, but init once may not even be how we want to do that so I'll punt this till we hit it


pkg/workload/tpcc/random.go, line 89 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Should these methods make it clear (in their name) that they're only allowed for initial data?

Done


pkg/workload/workloadimpl/precomputedrand_test.go, line 81 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Nit: fmt.Sprintf("len=%d", shortLen) at which point why not unify the three subbenchmarks and range []int{shortLen, mediumLen, longLen} { ... }

Done

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@danhhz danhhz force-pushed the workload_rand_precomputed branch 2 times, most recently from fba9589 to 56df27f Compare June 28, 2019 22:11
@danhhz danhhz requested a review from a team June 28, 2019 22:11
@danhhz
Copy link
Contributor Author

danhhz commented Jun 28, 2019

Putting the offsets in the sync.Pool objects wasn't deterministic (and the test caught it, yay). Should be deterministic again. It's a tiny bit slower than the non-deterministic version but still much faster than what's in master. For some reason I also seem to have saved some allocs

@danhhz danhhz force-pushed the workload_rand_precomputed branch from 56df27f to c126efe Compare June 28, 2019 22:36
This is explictly allowed by the spec only for initial data load in
4.3.2.1:

For the purpose of populating the initial database only, random numbers
can be generated by selecting entries in sequence from a set of at least
10,000 pregenerated random numbers. This technique cannot be used for
the field O_OL_CNT.

    name                             old time/op    new time/op    delta
    InitialData/tpcc/warehouses=1-8     368ms ± 9%     229ms ± 1%  -37.77%  (p=0.000 n=15+15)

    name                             old speed      new speed      delta
    InitialData/tpcc/warehouses=1-8   299MB/s ± 9%   481MB/s ± 1%  +60.50%  (p=0.000 n=15+15)

    name                             old alloc/op   new alloc/op   delta
    InitialData/tpcc/warehouses=1-8     193kB ± 0%     126kB ± 0%  -34.80%  (p=0.000 n=14+14)

    name                             old allocs/op  new allocs/op  delta
    InitialData/tpcc/warehouses=1-8       592 ± 0%       458 ± 0%  -22.59%  (p=0.000 n=15+15)

Release note: None
@danhhz danhhz force-pushed the workload_rand_precomputed branch from c126efe to 18b4d44 Compare July 8, 2019 19:01
@danhhz
Copy link
Contributor Author

danhhz commented Jul 8, 2019

Thanks for the review!

bors r=tbg

craig bot pushed a commit that referenced this pull request Jul 8, 2019
38208: workload/tpcc: pre-compute random data for initial data load strings r=tbg a=danhhz

This is explictly allowed by the spec only for initial data load in
4.3.2.1:

For the purpose of populating the initial database only, random numbers
can be generated by selecting entries in sequence from a set of at least
10,000 pregenerated random numbers. This technique cannot be used for
the field O_OL_CNT.

    name                             old time/op    new time/op    delta
    InitialData/tpcc/warehouses=1-8     368ms ± 9%     229ms ± 1%  -37.77%  (p=0.000 n=15+15)

    name                             old speed      new speed      delta
    InitialData/tpcc/warehouses=1-8   299MB/s ± 9%   481MB/s ± 1%  +60.50%  (p=0.000 n=15+15)

    name                             old alloc/op   new alloc/op   delta
    InitialData/tpcc/warehouses=1-8     193kB ± 0%     126kB ± 0%  -34.80%  (p=0.000 n=14+14)

    name                             old allocs/op  new allocs/op  delta
    InitialData/tpcc/warehouses=1-8       592 ± 0%       458 ± 0%  -22.59%  (p=0.000 n=15+15)



Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 8, 2019

Build succeeded

@craig craig bot merged commit 18b4d44 into cockroachdb:master Jul 8, 2019
@danhhz danhhz deleted the workload_rand_precomputed branch July 9, 2019 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants