-
Notifications
You must be signed in to change notification settings - Fork 3.9k
storage/rangefeed: use fine-grained locking around Processor, add metrics #36744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/rangefeed: use fine-grained locking around Processor, add metrics #36744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @nvanbenschoten)
pkg/storage/replica_rangefeed.go, line 306 at r2 (raw file):
// Create a new rangefeed. desc := r.Desc()
for my own understanding, was this broken before? it has RaftMuLocked in the name but this is the replica mu
pkg/storage/replica_rangefeed.go, line 418 at r2 (raw file):
// can reconnect at a later time, at which point all new Raft commands // should have logical op logs. r.disconnectRangefeedWithErr(p, roachpb.NewError(roachpb.NewRangeFeedRetryError(
should this stay disconnectRangefeedWithReason? that would be grabbing the rangefeedMu a second time unnecessarily, but this doesn't really seem like a hot path and maybe the consistency is worth it
39e2202
to
bce3d4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @danhhz)
pkg/storage/replica_rangefeed.go, line 306 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
for my own understanding, was this broken before? it has RaftMuLocked in the name but this is the replica mu
I think it worked out to be ok because we had raftMu
locked and r.mu.state.Desc
isn't currently ever mutated without both Replica.raftMu
and Replica.mu
locked (see Replica.setDesc
), but it certainly looked suspicious.
pkg/storage/replica_rangefeed.go, line 418 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
should this stay disconnectRangefeedWithReason? that would be grabbing the rangefeedMu a second time unnecessarily, but this doesn't really seem like a hot path and maybe the consistency is worth it
Done.
It is critical for correctness that operations like providing the processor with logical ops or informing the processor of closed timestamp updates be properly synchronized via the raftMu. This ensures that events published on the processors eventC accurately reflect the order of events in the Raft log. However, it is not critical that lifecycle events around starting a Rangefeed processor, stopping a Rangefeed processor, and registering with a Rangefeed processor be synchronized with Raft application. This change exploits this to break up the locking around rangefeed.Processor. Using more fine-grained locking opens up the opportunity to interact with the rangefeed Processor without needing to synchronize with the Range's raft application, which can be very expensive. The next commit demonstrates the benefit of this. Release note: None
This commit uses the improved locking around rangefeed.Processor to add new metrics about the number of RangeFeed registrations on a specific Replica. Release note: None
This commit adds handling to DistSender to retry on `StoreNotFound` and `NodeUnavailable` errors. These errors are likely to be unique to the replica that reported them, so no action is required before the next retry. Release note: None
bce3d4a
to
f42487c
Compare
bors r+ |
36744: storage/rangefeed: use fine-grained locking around Processor, add metrics r=nvanbenschoten a=nvanbenschoten It is critical for correctness that operations like providing the processor with logical ops or informing the processor of closed timestamp updates be properly synchronized via the raftMu. This ensures that events published on the processors eventC accurately reflect the order of events in the Raft log. However, it is not critical that lifecycle events around starting a Rangefeed processor, stopping a Rangefeed processor, and registering with a Rangefeed processor be synchronized with Raft application. This change exploits this to break up the locking around rangefeed.Processor. Using more fine-grained locking opens up the opportunity to interact with the rangefeed Processor without needing to synchronize with the Range's raft application, which can be very expensive. The next commit uses this improvement to add a new `rangefeed_registrations` metric to `RangeInfo`. 36999: roachtest: add schemachange/bulkingest test r=lucy-zhang a=lucy-zhang Add a test to index the random `payload` column for the `bulkingest` workload, to test the index backfiller for an index on randomly ordered values relative to the primary key. Release note: None 37169: opt: don't use delegateQuery for ShowZoneConfig r=RaduBerinde a=RaduBerinde Part of a larger work item to remove `delegateQuery`, as it doesn't work with the optimizer. I could not use the new delegate infrastructure because it doesn't support hiding columns returned by the delegated query. Release note: None 37182: roachtest: provision more cpu for schemachange tpcc tests r=vivekmenezes a=vivekmenezes related to #36094 #36024 #36321 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Lucy Zhang <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Vivek Menezes <[email protected]>
Build succeeded |
It is critical for correctness that operations like providing the processor
with logical ops or informing the processor of closed timestamp updates
be properly synchronized via the raftMu. This ensures that events published
on the processors eventC accurately reflect the order of events in the Raft
log.
However, it is not critical that lifecycle events around starting a
Rangefeed processor, stopping a Rangefeed processor, and registering
with a Rangefeed processor be synchronized with Raft application. This
change exploits this to break up the locking around rangefeed.Processor.
Using more fine-grained locking opens up the opportunity to interact
with the rangefeed Processor without needing to synchronize with the
Range's raft application, which can be very expensive. The next commit
uses this improvement to add a new
rangefeed_registrations
metric toRangeInfo
.