-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[DNM] storage/concurrency: introduce concurrency control package, prototype SFU #43775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] storage/concurrency: introduce concurrency control package, prototype SFU #43775
Conversation
Exciting improvements here. How stable are these results? I ask because in recent experimentation with |
The throughput numbers across the trials were varying by about ± 2-3%. The latency percentiles were more stable across trials, which looks to be because they're quantized. I'm actually hopeful that the fairness improvements here will help reduce variance and make YCSB more stable. |
// in the MVCC keyspace. This is performed transparently by a lockAwareIter. | ||
// | ||
// When the request completes, it exits any lockWaitQueues that it was at the | ||
// head of and releases its latches. However, any locks that it inserted into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only on success, right? If the request tries to write an intent but this somehow fails below Raft, the lock should go away? Or I guess it's fine to have an in-mem lock too many, then we'll just push more than we need to (hope that doesn't confuse deadlock detection somehow). Hmm, SFU will basically mean having in-mem locks that aren't backed by replicated locks, so it will all just have to work anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only on success, right? If the request tries to write an intent but this somehow fails below Raft, the lock should go away?
Yes, documented.
Hmm, SFU will basically mean having in-mem locks that aren't backed by replicated locks, so it will all just have to work anyway.
Yeah, we'll have to make it work. For now, though, I'm not going to worry about inserting locks on failure paths.
// concurrency manager about the replicated write intent that was missing | ||
// from its lock table. After doing so, it enqueues the request that hit the | ||
// error in the lock's wait-queue (but does not wait) and releases the | ||
// guard's latches. It returns an updated guard reflecting this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When these methods return a new *Guard
, does it mean that the original *Guard
must no longer be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
HandleWriterIntentError(context.Context, *Guard, *roachpb.WriteIntentError) (*Guard, *Error) | ||
// HandleTransactionPushError consumes a TransactionPushError by informing | ||
// the concurrency manager about a transaction record that could not be | ||
// pushes. After doing so, it releases the guard's latches. It returns an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// been updated. The argument indicates whether this manager's replica is | ||
// the leaseholder going forward. | ||
OnLeaseUpdated(bool) | ||
// OnSplit informs the concurrency manager that its range has split of a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// encountered. | ||
ScanAndEnqueue(Request, *Guard) []lockWaitQueueGuard | ||
// Dequeue removes the guard from its lock wait-queue. | ||
Dequeue(lockWaitQueueGuard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this called if a waiter gives up? In normal operation we'll only ever dequeue in FIFO order, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can be called if a waiter gives up, so we'll need to handle out of order dequeue-ing. Under normal operation though, you're correct. Commented.
// Dequeue removes the guard from its lock wait-queue. | ||
Dequeue(lockWaitQueueGuard) | ||
// SetBounds sets the key bounds of the lockTable. | ||
SetBounds(start, end roachpb.RKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed right now, but it will be needed once we pull the replicated portion of the lockTable under here. I already had the wiring hooked up to the rest of the Storage package, which I think is valuable. I'll remove this for now though.
} | ||
|
||
// Wait on each newly conflicting lock, if applicable. | ||
if m.lwq.MustWaitOnAny(g.wqgs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be wqgs
(which I know isn't in scope here)? Just trying to confirm my understanding that at the beginning of each iter in the for
loop g.wqgs
has been WaitOn
'ed already and is only carried with the guard for .. other reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally that would have worked, but not anymore because in #43740 introduced the notion of "breaking" reservations. Essentially this means that waiters can lose their place at the front of a wait-queue while not holding latches. It is only safe to conclude that a waiter is at the front of all wait-queues if it is holding latches.
The reason for this is pretty subtle, but @sumeerbhola does a good job documenting it in that PR. Because we don't hold range locks while waiting in per-key wait-queues, we run into situations like the following. It's possible that two ranged writes txn1 and txn2 over keys [a, d) might sequence such that req txn1 initially acquires latches first. It finds a lock on key c
and begins waiting. A different txn3 might then get sequenced and perform a point write on key a
. When txn2 is sequenced it will enter the wait-queue for a
(@ front) and c
(behind txn1). When txn1 re-acquires latches and re-scans the lockTable, it will find the lock on a
and enter its wait-queue. We don't want it getting behind txn2 in the queue or we'd cause a deadlock.
So, the solution Sumeer came up with to address this was to give each request a monotonically increasing sequence number when it first scans the lockTable. This creates a total order over requests. The order does not tell us exactly what order requests will finish being sequenced by the concurrency manager. However, it does tell us how to order conflicting transactions in a given wait-queue. In other words, the policy for entering a wait-queue is not that requests automatically get in the back. Instead, requests scan from the back to find where they fit in the sequence-order list of reqs. This avoids the kinds of deadlocks discussed above.
However, to get it to work, we need to abandon the idea that once a request is at the front of a wait-queue, it stays there. Instead, requests much check all of their wait-queue entries on each iteration through the loop while holding latches. A request is only safe to proceed when it observes that it is at the front of all of its wait-queues during a single pass.
The correctness of this is subtle but I think it's mentioned in #43740. A consequence of it is that as a req with a low seq num, we need to be very careful about never breaking the reservation of another request whose latch set might be compatible with its, because latching is what's providing the mutual exclusion to atomically determine that a request holds a reservation for (is at the front of) all wait-queues that it's a part of. The fact that non-locking (lock.None
) reads are never going to hold reservations in wait-queues makes this easier to think through.
// want to wait of locks like (PutRequest and ScanRequest) and then there | ||
// are those that don't want to wait on locks like (QueryIntentRequest, | ||
// RefreshRequest, and ResolveIntentRequest). Should we define a flag for | ||
// this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than trying to keep these "internal" request types side-by-side with the "public" KV API (Scan, Put, ...) we should establish a boundary between them. I feel that the flags have not been helping that, and they also tend to obfuscate understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also grown a little disillusioned about flags for something like this because they're defined so far away from where they're used. I'm curious what you're thinking in terms of a boundary.
switch arg.Method() { | ||
case roachpb.HeartbeatTxn: | ||
case roachpb.Refresh: | ||
case roachpb.RefreshRange: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've expected to see ResolveIntent{,Range} here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat surprisingly, ResolveIntent{,Range} is not roachpb.IsTransactional
. Neither of the requests have the isTxn
flag.
// Wait on each new txnWaitQueue entry. | ||
// TODO: should we do this here? | ||
for _, wqg := range g.wqgs[orig:] { | ||
if err := m.lwq.WaitOn(ctx, g.req, wqg); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the case in which I find out about a lock only from the replicated key space, right? And there's the other case in which we discover locks directly from the lock table. I would've expected that we'd wait in SequenceReq
in all cases, i.e. HandleWriterIntentError
would be called and after the caller would pass the resulting guard to SequenceReq
. (This would also explain why my assumption in that method about only having to wait on new wgqs is wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly how things are supposed to work. I think I just forgot to address this TODO and delete the code. Done.
This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`. These definitions are currently unused but will be referenced soon by cockroachdb#43740 and cockroachdb#43775. The changes around SELECT FOR UPDATE in the optimizer (cockroachdb#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API. Release note: None
This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`. These definitions are currently unused but will be referenced soon by cockroachdb#43740 and cockroachdb#43775. The changes around SELECT FOR UPDATE in the optimizer (cockroachdb#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API. Release note: None
5a5a4cc
to
ae059a8
Compare
43862: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC r=tbg a=ajwerner This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: `storage: rework RunGC so it no longer buffers keys and values in memory`. 1) Fix the spanset Iterator such that SeekLT can seek to the end of the range. 2) Expose reverse iteration in rditer.ReplicaDataIterator. 3) Rearrange some code in gc_queue.go to put the RunGC() helpers below that function. 4) Remove lockableGCInfo because there was no need for it. 5) Move RunGC into a `gc` subpackage. 5) Rework RunGC to paginate versions of keys with reverse iteration. Fixes #42531. 44054: storage/concurrency/lock: define lock Strength and Durability modes r=nvanbenschoten a=nvanbenschoten This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`. These definitions are currently unused but will be referenced soon by #43740 and #43775. The changes around SELECT FOR UPDATE in the optimizer (#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API. Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
…cher Relates to cockroachdb#40205. This commit is a follow-up to cockroachdb#44015. It propagates the row-level locking modes from the optbuilder all the way down to the row.Fetcher. This requires passing the locking information through the execbuilder and then through DistSQL. The commit leaves off at the point where `roachpb.ScanRequest`s are constructed. I couldn't find a good way to test this. It's possible I missed something and would love any tips, but I think we may need to wait until the plumbing into the KV API is complete before performing end-to-end tests. The next step will be to hook the locking modes into the KV API. Release note: None
…Result This allows the handling of added, updated, and resolved intents to become more reactive. It mirrors our handling of UpdatedTxns and their interaction with the TxnWaitQueue.
…totype SFU Informs cockroachdb#41720. This commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Any reader of this commit should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented and understood in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed after intent resolution as soon as possible. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (cockroachdb#6583). 4. provide stronger guarantees around fairness when transactions conflict, in order to reduce tail latencies under contended scenarios. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in cockroachdb#41720. WARNING: this is still a WIP. Notably, the lockTableImpl is mocked out with a working but incomplete implementation. See cockroachdb#43740 for a more complete strawman. Release note: None
This commit hacks in upgrade locking into ScanRequests and uses it in YCSB.
ae059a8
to
4a8e8dc
Compare
Informs cockroachdb#41720. The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (cockroachdb#6583). 4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in cockroachdb#41720. This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big differences are that it updates the lockTable interface to match the one introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way.
Informs cockroachdb#41720. The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (cockroachdb#6583). 4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in cockroachdb#41720. This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big differences are that it updates the lockTable interface to match the one introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way.
Informs cockroachdb#41720. The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (cockroachdb#6583). 4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in cockroachdb#41720. This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big differences are that it updates the lockTable interface to match the one introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way.
Informs cockroachdb#41720. The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (cockroachdb#6583). 4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in cockroachdb#41720. This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big differences are that it updates the lockTable interface to match the one introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way.
44787: storage/concurrency: define concurrency control interfaces r=nvanbenschoten a=nvanbenschoten Informs #41720. The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there. The new package has a few primary objectives: 1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation. 2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution. 3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (#6583). 4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries. 5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in #41720. This commit pulls over a lot of already reviewed code from #43775. The big differences are that it updates the lockTable interface to match the one introduced in #43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way. Co-authored-by: Nathan VanBenschoten <[email protected]>
…ments This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in cockroachdb#43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.
45062: storage/concurrency: implement concurrency Manager r=nvanbenschoten a=nvanbenschoten Informs #41720. Informs #44976. Initially drawn out in #43775. This PR implements the concurrency.Manager interface, which is the core structure that ties together the new concurrency package. The concurrency manager is a structure that sequences incoming requests and provides isolation between requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes. The manager accomplishes this by piecing together the following components in its request sequencing path: - `latchManager` - `lockTable` - `lockTableWaiter` - `txnWaitQueue` The largest part of this change is introducing the datadriven testing framework to deterministically test the concurrency manager. This proved difficult for two reasons: 1. the concurrency manager composes several components to perform it work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was difficult to get consistent observability into each of these components in such a way that tests could be run against a set of concurrent requests interacting with them all. 2. the concurrency manager exposes a primarily blocking interface. Requests call `Sequence()` and wait for sequencing to complete. This may block in a number of different places - while waiting on latches, while waiting on locks, and while waiting on other transactions. The most important part of these tests is to assert _where_ a given request blocks based on the current state of the concurrency manager and then assert _how_ the request reacts to a state transition by another request. To address the first problem, the testing harness uses the context-carried tracing infrastructure to track the path of a request. We already had log events scattered throughout these various components, so this did not require digging testing hooks into each of them. Instead, the harness attached a trace recording span to each request and watches as events are added to the span. It then uses these events as the output of the text. To address the second problem, the testing harness introduces a monitor object which manages a collection of "monitored" goroutines. The monitor watches as these goroutines run and keeps track of their goroutine state as is reported by a goroutine dump. During each step of the datadriven test, the monitor allows all goroutines to proceed until they have either terminated or stalled due to cross-goroutine synchronization dependencies. For instance, it waits for all goroutines to stall while receiving from channels. We can be sure that the goroutine dump provides a consistent snapshot of all goroutine states and statuses because `runtime.Stack(all=true)` stops the world when called. This means that when all monitored goroutines are simultaneously stalled, we have a deadlock that can only be resolved by proceeding forward with the test and releasing latches, resolving locks, or committing transactions. This structure worked surprisingly well and has held up to long periods of stressrace. Co-authored-by: Nathan VanBenschoten <[email protected]>
…ments This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in cockroachdb#43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.
…ments This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in cockroachdb#43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.
…ments This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in cockroachdb#43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.
45159: sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements r=nvanbenschoten a=nvanbenschoten This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements. Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPDATE t = SELECT FROM t + INSERT INTO t => UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention. The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable. The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in #43775) about the performance impact this change has on uncontended and contended workloads. Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting. 45459: sqlsmith: silence sqlsmith issue filing after query timeout r=yuzefovich a=rohany Release note: None 45462: storage/engine: micro-optimize MVCCKeyCompare r=petermattis a=petermattis `MVCCKeyCompare` is the single largest CPU usage function in a `tpccbench` test. So give it a bit of micro-optimization love. While we're here, move this function next to the rest of the Pebble code as that is the main user. name old time/op new time/op delta MVCCKeyCompare-16 15.8ns ± 4% 9.1ns ± 1% -42.57% (p=0.000 n=10+10) Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Rohan Yadav <[email protected]> Co-authored-by: Peter Mattis <[email protected]>
Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactional that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved.
Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactional that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved.
Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactional that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.
Closes cockroachdb#40205. Informs cockroachdb#41720. This change teaches the KV client and the KV API about unreplicated locks. It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which allows their users to select the locking strength that they would like the scan to use. This locking strength defaults to None, which corresponds to the current behavior. However, some users will want to acquire locks on each row scanned, which is now possible by setting the locking strength to a stronger level. For now, only the Exclusive strength is supported. The change then revisits SQL's row-level locking support, which is supported all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit (e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new key-locking functionality in the KV API to hook up row-level locking, completing the integration of SELECT FOR UPDATE with the KV layer and, in particular, the new lock-table structure. cockroachdb#43775 described the three main benefits of this change: - higher throughput under contention - lower latency and improved fairness under contention - a reduction in transaction retries under contention I've revisited those results a few times in the last two months and seen that the results continue to hold, and in some cases they have improved. I intend to update this PR with a more complete analysis of its impact on those three areas. Release note (sql change): SELECT FOR UPDATE now hooks into a new leaseholder-only locking mechanism. This allows the feature to be used to improve performance of transactions that read, modify, and write contended to rows. Similarly, UPDATE statements now use this new mechanism by default, meaning that their performance under contention is improved. This is only enabled for UPDATE statements that can push their filter all the way into their key-value scan. To determine whether an UPDATE statement is implicitly using SELECT FOR UPDATE locking, look for a "locking strength" field in the EXPLAIN output for the statement.
Informs #41720.
The PR creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at
concurrency_control.go
and move out from there.The new package has a few primary objectives:
Select For Update Prototype Results
The last commit in the PR contains a hacked version of upgrade locking hooked into the new concurrency package. This allows us to run several experiments to explore its effectiveness in providing its anticipated benefits.
Below is a collection of tests using YCSB. All experiments were run on a 3 VM
m5d.4xlarge
cluster with local SSDs. The tests used a hacked version of YCSB which contains an additional workload"U"
, which is similar to workload"A"
except it performs UPDATES 100% of the time. A new request distribution called"single"
was also added, which causes all requests to operate on a single row, though they are still spread evenly across all 10 column families in that row.NOTE that these numbers compare master against a completely unoptimized implementation of an in-memory lock table. The implementation uses a single top-level mutex and makes no attempt to avoid memory allocations or optimize for common cases. There is some low hanging fruit here that would improve its performance measurably.
Throughput Under Contention
Improving the throughput of heavily contended workloads is a recurring goal. The combination of the concurrency package and SFU locking (implicit for UPDATE statements) promises to move towards this goal for heavily contended UPDATE workloads because it avoids thrashing and other wasted work.
Here we compare master against this prototype with three different YCSB variants:
Each variant was tested with each binary over a collection of 3 two-minute trials and the throughput recorded in each trial was averaged together.
We see in the
U / single / 64
workload that SFU significantly improves the throughput of highly contended workloads, in this case by 49%. This is likely because of more efficient queuing and fewer transaction retries. The improvement goes away with less contention and only writes (U / zipf / 64
), but comes back to some degree (5%) when reads are added into the mix (A / zipf / 64
). My best theory for why this is is that upgrade locking improves fairness between readers and writers (see below), which helps prevent starvation of writers.We'd also expect this to have a larger impact on bigger machines that can tolerate more contention in general.
Fairness Under Contention (i.e. Tail Latencies)
An issue identified early in the SFU work was that the current state of contention handling in Cockroach has serious fairness issues. There is very little queueing so transactions typically race to lay down intents as they repeatedly hit WriteTooOld errors or run into each other's intents. The concurrency package was meant to improve this situation, and SFU improves on it further by avoiding thrashing between reading and writing during an UPDATE operation.
We can see the effect of this by comparing the tail latencies when running YCSB against master and this prototype:
In this comparison, we see that tail latencies currently explode as contention increases (
U / single / 64
is the most contended variant). However, the use of SFU avoids this explosion in tail latencies, maintaining a much tighter spread in transaction latencies across the workload variants. This is a clear demonstration of the effectiveness of this new approach.Transaction Retries
The third goal of the SELECT FOR UPDATE work is to provide users with the ability to avoid transaction retries in certain scenarios where they're performing a read and then an update to the same row. This is what UPDATE statements do internally, and it turns out that today, UPDATE statements on their own can cause transaction retries (see the "Transaction retries in TPC-C" Google group thread) if they read a value that changes before they write to it. We can again explore this situation using YCSB.
In this experiment, we ran YCSB-A (
A / zipf / 64
) without any modifications and monitored the number of transaction retries. Because YCSB runs single-statement transactions, all of these would be retries on the SQL gateway, but these statements could just as easily be run in explicit transactions, in which case these retries would make it back to the client. For practical reasons, I started running with the new SFU prototype and then switched back to master half-way through the test (around 5:25). Take note of theKV Transaction Restart
graph.We see essentially no transaction restarts when using the SFU-enhanced prototype. However, when we switch back to master, we see restarts jump up to around 140 per second across three classes of retry reasons. This confirms that in cases like these, SELECT FOR UPDATE will be an effective tool in reducing or eliminating transaction retries.