[DNM] storage/concurrency: introduce concurrency control package, prototype SFU #43775

nvanbenschoten · 2020-01-07T07:04:26Z

Informs #41720.

The PR creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at concurrency_control.go and move out from there.

The new package has a few primary objectives:

centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation.
rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution.
create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}) #6583).
provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries.
create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in storage: separated lock-table keyspace #41720.

Select For Update Prototype Results

The last commit in the PR contains a hacked version of upgrade locking hooked into the new concurrency package. This allows us to run several experiments to explore its effectiveness in providing its anticipated benefits.

Below is a collection of tests using YCSB. All experiments were run on a 3 VM m5d.4xlarge cluster with local SSDs. The tests used a hacked version of YCSB which contains an additional workload "U", which is similar to workload "A" except it performs UPDATES 100% of the time. A new request distribution called "single" was also added, which causes all requests to operate on a single row, though they are still spread evenly across all 10 column families in that row.

NOTE that these numbers compare master against a completely unoptimized implementation of an in-memory lock table. The implementation uses a single top-level mutex and makes no attempt to avoid memory allocations or optimize for common cases. There is some low hanging fruit here that would improve its performance measurably.

Throughput Under Contention

Improving the throughput of heavily contended workloads is a recurring goal. The combination of the concurrency package and SFU locking (implicit for UPDATE statements) promises to move towards this goal for heavily contended UPDATE workloads because it avoids thrashing and other wasted work.

Here we compare master against this prototype with three different YCSB variants:

- workload U, single distribution,  64 threads
- workload U, zipfian distribution, 64 threads
- workload A, zipfian distribution, 64 threads

Each variant was tested with each binary over a collection of 3 two-minute trials and the throughput recorded in each trial was averaged together.

We see in the U / single / 64 workload that SFU significantly improves the throughput of highly contended workloads, in this case by 49%. This is likely because of more efficient queuing and fewer transaction retries. The improvement goes away with less contention and only writes (U / zipf / 64), but comes back to some degree (5%) when reads are added into the mix (A / zipf / 64). My best theory for why this is is that upgrade locking improves fairness between readers and writers (see below), which helps prevent starvation of writers.

We'd also expect this to have a larger impact on bigger machines that can tolerate more contention in general.

Fairness Under Contention (i.e. Tail Latencies)

An issue identified early in the SFU work was that the current state of contention handling in Cockroach has serious fairness issues. There is very little queueing so transactions typically race to lay down intents as they repeatedly hit WriteTooOld errors or run into each other's intents. The concurrency package was meant to improve this situation, and SFU improves on it further by avoiding thrashing between reading and writing during an UPDATE operation.

We can see the effect of this by comparing the tail latencies when running YCSB against master and this prototype:

In this comparison, we see that tail latencies currently explode as contention increases (U / single / 64 is the most contended variant). However, the use of SFU avoids this explosion in tail latencies, maintaining a much tighter spread in transaction latencies across the workload variants. This is a clear demonstration of the effectiveness of this new approach.

Transaction Retries

The third goal of the SELECT FOR UPDATE work is to provide users with the ability to avoid transaction retries in certain scenarios where they're performing a read and then an update to the same row. This is what UPDATE statements do internally, and it turns out that today, UPDATE statements on their own can cause transaction retries (see the "Transaction retries in TPC-C" Google group thread) if they read a value that changes before they write to it. We can again explore this situation using YCSB.

In this experiment, we ran YCSB-A (A / zipf / 64) without any modifications and monitored the number of transaction retries. Because YCSB runs single-statement transactions, all of these would be retries on the SQL gateway, but these statements could just as easily be run in explicit transactions, in which case these retries would make it back to the client. For practical reasons, I started running with the new SFU prototype and then switched back to master half-way through the test (around 5:25). Take note of the KV Transaction Restart graph.

We see essentially no transaction restarts when using the SFU-enhanced prototype. However, when we switch back to master, we see restarts jump up to around 140 per second across three classes of retry reasons. This confirms that in cases like these, SELECT FOR UPDATE will be an effective tool in reducing or eliminating transaction retries.

cockroach-teamcity · 2020-01-07T07:04:36Z

This change is

petermattis · 2020-01-07T15:08:50Z

Exciting improvements here. How stable are these results? I ask because in recent experimentation with ycsb/A on Pebble, I noticed ± 10% throughput variation when running on master/RocksDB. I ended up needing to run each test 10-20 times, format the results in a manner compatible with benchstat and then use benchstat to tease out the delta. ycsb/A seemed more prone to this variation than some the kv0 and kv95 tests, presumably because of the contention (though I didn't verify that).

nvanbenschoten · 2020-01-07T16:36:52Z

How stable are these results?

The throughput numbers across the trials were varying by about ± 2-3%. The latency percentiles were more stable across trials, which looks to be because they're quantized. I'm actually hopeful that the fairness improvements here will help reduce variance and make YCSB more stable.

tbg · 2020-01-15T12:38:04Z