Skip to content

sql: speed up aggregates as window functions in some cases #38836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 12, 2019

Conversation

yuzefovich
Copy link
Member

In a case when an aggregate is computed as a window function over
a window frame such that all rows of the partitions are inside of
the window for each of the row, the result of aggregation is the
same for all rows. Previously, we would recompute it every time,
but now we're computing it only once.

Addresses: #38818.

Release note: None

@yuzefovich yuzefovich requested review from jordanlewis, justinj and a team July 11, 2019 23:37
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich requested a review from ridwanmsharif July 11, 2019 23:38
@yuzefovich
Copy link
Member Author

This should be an enormous speedup for aggregations with ORDER BY since those are planned as window functions internally with RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, right?

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is huge. :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @ridwanmsharif, and @yuzefovich)


pkg/sql/sem/tree/window_funcs.go, line 525 at r1 (raw file):

	precedingConfirmed := wfr.Frame.Bounds.StartBound.BoundType == UnboundedPreceding
	followingConfirmed := wfr.Frame.Bounds.EndBound.BoundType == UnboundedFollowing
	if wfr.Frame.Mode == ROWS || wfr.Frame.Mode == GROUPS {

this section feels a little overly clever to me :) not convinced this case is that important compared to the other one, but your call.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

I can't tell by the logic test but is the more complex case (groups and rows) also tested?

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis, @ridwanmsharif, and @yuzefovich)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the newly added logic test checks all combinations (I've just confirmed it by step-by-step debugging).

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @justinj and @ridwanmsharif)


pkg/sql/sem/tree/window_funcs.go, line 525 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

this section feels a little overly clever to me :) not convinced this case is that important compared to the other one, but your call.

I agree that this case is less important, yet it can be hit even with small offsets (when partitions are small). I wouldn't say it is too clever - we're simply checking the boundary conditions, and if those are satisfied, then "internal" conditions will also be satisfied. Maybe my comments are confusing?

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @ridwanmsharif and @yuzefovich)


pkg/sql/sem/tree/window_funcs.go, line 525 at r1 (raw file):

Previously, yuzefovich wrote…

I agree that this case is less important, yet it can be hit even with small offsets (when partitions are small). I wouldn't say it is too clever - we're simply checking the boundary conditions, and if those are satisfied, then "internal" conditions will also be satisfied. Maybe my comments are confusing?

No, the comments are fine, I just think it's proportionally a lot of stuff to understand for a case I'm not convinced is common. I'm not bothered too much either way.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @justinj and @ridwanmsharif)


pkg/sql/sem/tree/window_funcs.go, line 525 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

No, the comments are fine, I just think it's proportionally a lot of stuff to understand for a case I'm not convinced is common. I'm not bothered too much either way.

Ok, I'll keep this case then.

@yuzefovich
Copy link
Member Author

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 12, 2019

Canceled

In a case when an aggregate is computed as a window function over
a window frame such that all rows of the partitions are inside of
the window for each of the row, the result of aggregation is the
same for all rows. Previously, we would recompute it every time,
but now we're computing it only once.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj and @ridwanmsharif)


pkg/sql/sem/tree/window_funcs.go, line 525 at r1 (raw file):

Previously, yuzefovich wrote…

Ok, I'll keep this case then.

I've just realized that my comments, indeed, were confusing.

@yuzefovich
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Jul 12, 2019
38836: sql: speed up aggregates as window functions in some cases r=yuzefovich a=yuzefovich

In a case when an aggregate is computed as a window function over
a window frame such that all rows of the partitions are inside of
the window for each of the row, the result of aggregation is the
same for all rows. Previously, we would recompute it every time,
but now we're computing it only once.

Addresses: #38818.

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@awoods187
Copy link
Contributor

What kind of speed ups are we talking?

Also, nice rapid fixing!

@yuzefovich
Copy link
Member Author

So the speedup is going from quadratic behavior to linear.

For concrete numbers, the query SELECT array_agg(city ORDER BY revenue) FROM rides LIMIT 5 on movr dataset now finishes in 6-7s on my laptop, and I would estimate it would have taken about a million times more (the cardinality of rides table) without this fix (assuming we had enough RAM; in reality, with proper memory accounting, without this fix the query would not complete on pretty much any kind of cluster).

@craig
Copy link
Contributor

craig bot commented Jul 12, 2019

Build succeeded

@craig craig bot merged commit ca21fbf into cockroachdb:master Jul 12, 2019
@yuzefovich yuzefovich deleted the wf-agg branch July 12, 2019 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants