Skip to content

sql: account for memory of results of window functions computations #38839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2019

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 12, 2019

Previously, Datums that were the result of computing of window
functions were accounted for in memory monitoring only as nils which
could lead to OOM when the actual Datum's size was a lot bigger than
default one (array_agg and concat_agg were most prone to this).
Now, the underlying memory is tracked correctly.

Fixes: #38818.

Release note: None

@yuzefovich yuzefovich requested review from jordanlewis, asubiotto and a team July 12, 2019 03:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

Local benchmarks don't show any significant difference (they actually show a slight performance improvement that I would explain by noise):

+ make bench PKG=./pkg/sql/distsqlrun BENCHTIMEOUT=5m BENCHES=BenchmarkWindower 'TESTFLAGS=-count 10 -benchmem'
name                                                                          old time/op    new time/op    delta
Windower/SUM()_OVER_()-12                                                       64.5ms ± 0%    63.6ms ± 2%  -1.45%  (p=0.005 n=6+10)
Windower/SUM()_OVER_(ORDER_BY)-12                                                126ms ± 2%     121ms ± 1%  -3.93%  (p=0.000 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-12                 172ms ± 2%     169ms ± 1%  -1.91%  (p=0.002 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-12        190ms ± 2%     183ms ± 1%  -3.39%  (p=0.000 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-12             82.0ms ± 2%    81.7ms ± 2%    ~     (p=0.631 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-12     104ms ± 1%     104ms ± 2%    ~     (p=0.853 n=10+10)

name                                                                          old speed      new speed      delta
Windower/SUM()_OVER_()-12                                                     37.2MB/s ± 0%  37.8MB/s ± 2%  +1.48%  (p=0.005 n=6+10)
Windower/SUM()_OVER_(ORDER_BY)-12                                             19.1MB/s ± 2%  19.8MB/s ± 1%  +4.08%  (p=0.000 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-12              14.0MB/s ± 2%  14.2MB/s ± 1%  +1.94%  (p=0.003 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-12     12.6MB/s ± 2%  13.1MB/s ± 1%  +3.50%  (p=0.000 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-12           29.3MB/s ± 2%  29.4MB/s ± 2%    ~     (p=0.617 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-12  23.1MB/s ± 1%  23.1MB/s ± 3%    ~     (p=0.837 n=10+10)

name                                                                          old alloc/op   new alloc/op   delta
Windower/SUM()_OVER_()-12                                                       25.8MB ± 0%    25.8MB ± 0%    ~     (p=0.226 n=9+8)
Windower/SUM()_OVER_(ORDER_BY)-12                                               33.8MB ± 0%    33.8MB ± 0%    ~     (p=0.623 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-12                93.7MB ± 0%    93.7MB ± 0%    ~     (p=0.684 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-12        118MB ± 0%     118MB ± 0%    ~     (p=0.247 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-12             15.9MB ± 0%    15.9MB ± 0%    ~     (p=0.345 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-12    22.4MB ± 0%    22.4MB ± 0%    ~     (p=0.436 n=10+10)

name                                                                          old allocs/op  new allocs/op  delta
Windower/SUM()_OVER_()-12                                                         300k ± 0%      300k ± 0%    ~     (all equal)
Windower/SUM()_OVER_(ORDER_BY)-12                                                 600k ± 0%      600k ± 0%    ~     (all equal)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-12                 1.20M ± 0%     1.20M ± 0%    ~     (p=0.724 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-12        1.40M ± 0%     1.40M ± 0%    ~     (p=0.271 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-12               302k ± 0%      302k ± 0%    ~     (all equal)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-12      502k ± 0%      502k ± 0%    ~     (p=0.294 n=10+8)

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Can you add a test with a small memory account that proves that queries of this form will hit the limit when we expec them to?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test, and while looking closer to the code, I realized that we were, in fact, accounting for the results of computations, but only as nil Datums. So I think now the memory monitoring is correct - first, when we allocate a slice for windowValues, we use the default Datum size times the number of rows; then, once we know the size of the datum, we update the account accordingly. Jordan, could you please take a quick look at this to confirm my logic?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)


pkg/sql/distsqlrun/windower.go, line 555 at r1 (raw file):

		builtin.Reset(ctx)

		usage = datumSliceOverhead + sizeOfDatum*int64(partition.Len())

This is where we account for the slice with default datum size.

@yuzefovich
Copy link
Member Author

I also noticed that there was a bug in benchmarks of windower - FilterColIdx was not properly set. As a result, the code "thought" that there was a filter on zeroth column and comparing it to DBoolTrue, and since we have ints in there, the comparison failed as if the row was filtered out shortcircuiting the execution.

Memory account fix doesn't have any significant performance difference, but the true speed went down by about 50%:

name                                                                          old time/op    new time/op    delta
Windower/SUM()_OVER_()-24                                                        120ms ± 4%     123ms ± 3%  +1.97%  (p=0.011 n=10+10)
Windower/SUM()_OVER_(ORDER_BY)-24                                                215ms ± 2%     219ms ± 2%  +1.66%  (p=0.004 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-24                 328ms ± 3%     337ms ± 1%  +2.81%  (p=0.006 n=10+8)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-24        350ms ± 3%     354ms ± 3%    ~     (p=0.089 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-24              147ms ± 1%     150ms ± 1%  +1.97%  (p=0.000 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-24     184ms ± 1%     185ms ± 1%  +0.91%  (p=0.009 n=10+10)

name                                                                          old speed      new speed      delta
Windower/SUM()_OVER_()-24                                                     20.0MB/s ± 4%  19.6MB/s ± 2%  -1.95%  (p=0.010 n=10+10)
Windower/SUM()_OVER_(ORDER_BY)-24                                             11.2MB/s ± 2%  11.0MB/s ± 2%  -1.62%  (p=0.004 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-24              7.32MB/s ± 3%  7.12MB/s ± 1%  -2.77%  (p=0.005 n=10+8)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-24     6.87MB/s ± 3%  6.79MB/s ± 3%    ~     (p=0.100 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-24           16.3MB/s ± 1%  16.0MB/s ± 1%  -1.94%  (p=0.000 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-24  13.1MB/s ± 1%  13.0MB/s ± 1%  -0.90%  (p=0.007 n=10+10)

name                                                                          old alloc/op   new alloc/op   delta
Windower/SUM()_OVER_()-24                                                       33.0MB ± 0%    33.0MB ± 0%    ~     (p=0.397 n=8+8)
Windower/SUM()_OVER_(ORDER_BY)-24                                               41.0MB ± 0%    41.0MB ± 0%    ~     (p=0.217 n=9+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-24                 101MB ± 0%     101MB ± 0%    ~     (p=0.796 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-24        125MB ± 0%     125MB ± 0%    ~     (p=0.243 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-24             23.1MB ± 0%    23.1MB ± 0%    ~     (p=0.579 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-24    29.6MB ± 0%    29.6MB ± 0%    ~     (p=0.075 n=10+10)

name                                                                          old allocs/op  new allocs/op  delta
Windower/SUM()_OVER_()-24                                                         600k ± 0%      600k ± 0%    ~     (all equal)
Windower/SUM()_OVER_(ORDER_BY)-24                                                 900k ± 0%      900k ± 0%    ~     (all equal)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/)-24                 1.50M ± 0%     1.50M ± 0%    ~     (p=0.782 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_SINGLE_ROW_PARTITIONS_*/_ORDER_BY)-24        1.70M ± 0%     1.70M ± 0%    ~     (p=0.251 n=10+9)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/)-24               601k ± 0%      601k ± 0%    ~     (p=1.000 n=10+10)
Windower/SUM()_OVER_(PARTITION_BY/*_MULTIPLE_ROWS_PARTITIONS_*/_ORDER_BY)-24      801k ± 0%      801k ± 0%  +0.00%  (p=0.043 n=9+10)

@yuzefovich yuzefovich force-pushed the wf-fix-memory branch 2 times, most recently from 0fdf3d2 to eac0516 Compare July 17, 2019 20:38
Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @yuzefovich)


pkg/sql/distsqlrun/windower_test.go, line 64 at r2 (raw file):

	}

	t.Run("", func(t *testing.T) {

nit: there is no need to have this. Subtests aren't a requirement - having no t.Run will mean a single top level test.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

Previously, Datums that were the result of computing of window
functions were accounted for in memory monitoring only as nils which
could lead to OOM when the actual Datum's size was a lot bigger than
default one (array_agg and concat_agg were most prone to this).
Now, the underlying memory is tracked correctly.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @jordanlewis)


pkg/sql/distsqlrun/windower_test.go, line 64 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: there is no need to have this. Subtests aren't a requirement - having no t.Run will mean a single top level test.

Thanks, fixed.

craig bot pushed a commit that referenced this pull request Jul 22, 2019
38839: sql: account for memory of results of window functions computations r=yuzefovich a=yuzefovich

Previously, Datums that were the result of computing of window
functions were accounted for in memory monitoring only as nils which
could lead to OOM when the actual Datum's size was a lot bigger than
default one (array_agg and concat_agg were most prone to this).
Now, the underlying memory is tracked correctly.

Fixes: #38818.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 22, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: fatal error: runtime: out of memory on specifying the order of aggregations
3 participants