Skip to content

storage/rangefeed: use fine-grained locking around Processor, add metrics #36744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nvanbenschoten
Copy link
Contributor

It is critical for correctness that operations like providing the processor
with logical ops or informing the processor of closed timestamp updates
be properly synchronized via the raftMu. This ensures that events published
on the processors eventC accurately reflect the order of events in the Raft
log.

However, it is not critical that lifecycle events around starting a
Rangefeed processor, stopping a Rangefeed processor, and registering
with a Rangefeed processor be synchronized with Raft application. This
change exploits this to break up the locking around rangefeed.Processor.

Using more fine-grained locking opens up the opportunity to interact
with the rangefeed Processor without needing to synchronize with the
Range's raft application, which can be very expensive. The next commit
uses this improvement to add a new rangefeed_registrations metric to
RangeInfo.

@nvanbenschoten nvanbenschoten requested review from danhhz and a team April 11, 2019 01:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @nvanbenschoten)


pkg/storage/replica_rangefeed.go, line 306 at r2 (raw file):

	// Create a new rangefeed.
	desc := r.Desc()

for my own understanding, was this broken before? it has RaftMuLocked in the name but this is the replica mu


pkg/storage/replica_rangefeed.go, line 418 at r2 (raw file):

		// can reconnect at a later time, at which point all new Raft commands
		// should have logical op logs.
		r.disconnectRangefeedWithErr(p, roachpb.NewError(roachpb.NewRangeFeedRetryError(

should this stay disconnectRangefeedWithReason? that would be grabbing the rangefeedMu a second time unnecessarily, but this doesn't really seem like a hot path and maybe the consistency is worth it

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rangefeedLocking branch from 39e2202 to bce3d4a Compare April 29, 2019 15:42
Copy link
Contributor Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz)


pkg/storage/replica_rangefeed.go, line 306 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

for my own understanding, was this broken before? it has RaftMuLocked in the name but this is the replica mu

I think it worked out to be ok because we had raftMu locked and r.mu.state.Desc isn't currently ever mutated without both Replica.raftMu and Replica.mu locked (see Replica.setDesc), but it certainly looked suspicious.


pkg/storage/replica_rangefeed.go, line 418 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

should this stay disconnectRangefeedWithReason? that would be grabbing the rangefeedMu a second time unnecessarily, but this doesn't really seem like a hot path and maybe the consistency is worth it

Done.

It is critical for correctness that operations like providing the processor
with logical ops or informing the processor of closed timestamp updates
be properly synchronized via the raftMu. This ensures that events published
on the processors eventC accurately reflect the order of events in the Raft
log.

However, it is not critical that lifecycle events around starting a
Rangefeed processor, stopping a Rangefeed processor, and registering
with a Rangefeed processor be synchronized with Raft application. This
change exploits this to break up the locking around rangefeed.Processor.

Using more fine-grained locking opens up the opportunity to interact
with the rangefeed Processor without needing to synchronize with the
Range's raft application, which can be very expensive. The next commit
demonstrates the benefit of this.

Release note: None
This commit uses the improved locking around rangefeed.Processor
to add new metrics about the number of RangeFeed registrations
on a specific Replica.

Release note: None
This commit adds handling to DistSender to retry on `StoreNotFound`
and `NodeUnavailable` errors. These errors are likely to be unique
to the replica that reported them, so no action is required before
the next retry.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/rangefeedLocking branch from bce3d4a to f42487c Compare April 29, 2019 17:08
@nvanbenschoten
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Apr 29, 2019
36744: storage/rangefeed: use fine-grained locking around Processor, add metrics r=nvanbenschoten a=nvanbenschoten

It is critical for correctness that operations like providing the processor
with logical ops or informing the processor of closed timestamp updates
be properly synchronized via the raftMu. This ensures that events published
on the processors eventC accurately reflect the order of events in the Raft
log.

However, it is not critical that lifecycle events around starting a
Rangefeed processor, stopping a Rangefeed processor, and registering
with a Rangefeed processor be synchronized with Raft application. This
change exploits this to break up the locking around rangefeed.Processor.

Using more fine-grained locking opens up the opportunity to interact
with the rangefeed Processor without needing to synchronize with the
Range's raft application, which can be very expensive. The next commit
uses this improvement to add a new `rangefeed_registrations` metric to
`RangeInfo`.

36999: roachtest: add schemachange/bulkingest test r=lucy-zhang a=lucy-zhang

Add a test to index the random `payload` column for the `bulkingest` workload,
to test the index backfiller for an index on randomly ordered values relative
to the primary key.

Release note: None

37169: opt: don't use delegateQuery for ShowZoneConfig r=RaduBerinde a=RaduBerinde

Part of a larger work item to remove `delegateQuery`, as it doesn't
work with the optimizer.

I could not use the new delegate infrastructure because it doesn't support hiding columns returned by the delegated query.

Release note: None

37182: roachtest: provision more cpu for schemachange tpcc tests r=vivekmenezes a=vivekmenezes

related to 
#36094 
#36024 
#36321 

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Lucy Zhang <[email protected]>
Co-authored-by: Radu Berinde <[email protected]>
Co-authored-by: Vivek Menezes <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 29, 2019

Build succeeded

@craig craig bot merged commit f42487c into cockroachdb:master Apr 29, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/rangefeedLocking branch May 2, 2019 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants