-
Notifications
You must be signed in to change notification settings - Fork 3.9k
sql+storage: clarify+extend support for large single cells #15771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently, the relevant metric is the size of a row, whether that row is divided into multiple cells (or column families) or not. The limit is some function of the maximum range size, although I don't know what that function should be (it depends in part on things like how often the data gets overwritten). The maximum range size is configurable, and at least for now, increasing the range size will proportionally increase the allowable row size. We can guarantee that the "minimum maximum" will only increase and never decrease with future versions. |
@jseldess what is the page where this is documented? I could have a quick look |
@knz, I can't find any docs on this aside from this known limitation. Do we need anything else? |
Hmm I think we do. We need to remove the docs-done label here. I'll think about it, keeping in my backlog. |
OK, sorry for removing that prematurely. I believe Ben's comment about row size being the important metric tied into the way that known limitation was expressed, but it's been a while. |
@knz, any guidance on what we still need here in terms of docs? |
Jesse what we need is a new tuning page called "Production limits" where we explain how minimum and maximum values relate to each other. Peter and I just briefly chatted about this and we have a sketch for a first version of this page. The stories goes as follows. IntroductionThe limits in CockroachDB are all based around the notion of range. For example, the maximum amount of data that can be stored in a CockroachDB cluster is directly related to the maximum number of ranges and the maximum size of a range, and the maximum size of a row in a table is limited by the size of a single range, as described below. In order to frame the various limits that typically matter to application developers and architects, one must consider the range limits first. The other limits are derived from that afterwards. Range limitsBy default, the maximum size of a range is 64MiB. To manage a range, CockroachDB needs:
The maximum number of ranges per node is thus limited by disk space, RAM size and CPU time available when idle (no queries). In practice, given that disk space is rather cheap, the practical number of ranges per node is constrained by RAM size and CPU time. For example, in our test clusters with 8 cores per node, 16GB of RAM, and unlimited disk space, we observe a practical limit of about 50.000 ranges per node (3.2TB per node). Our experiments suggest the bottleneck is currently on CPU time. A higher capacity per node can thus be reached by increasing CPU core counts and RAM size according to needs. To estimate the maximum number of ranges across an entire cluster, one must consider the replication factor and the maximum number of nodes. The replication factor is configurable per zone, and is set to 3 by default. The maximum number of nodes is theoretically large. Currently Cockroach Labs continuously tests clusters of 10 nodes, and regularly clusters of 30 to 130 nodes. We aim to support more nodes eventually (thousands). For example, with 10 nodes and a replication factor of 3 and a maximum of 50.000 ranges per node, there can be about 170.000 ranges in the cluster (about 10TB). This limit can be increased further by increasing resources per node (as described above) and increasing the number of nodes. CockroachDB also supports changing the range size (e.g. to increase it) but this is currently not fully tested. Different zones can use different range sizes. Derived SQL schema limits
SQL Min/max row sizesThe minimum/maximum row size is decided by the types of the columns in the table. The documentation for each data type further details the data width of values of that type. Further limits apply:
Hence, with the default configuration of 64MB, the maximum row size is about 20MB. cc @petermattis @bdarnell for verification. |
Each range requires a few kilobytes of RAM, not megabytes.
FYI, the bottlenecks will likely change in future versions. In particular, I expect the per-range CPU to decrease.
The RocksDB FAQ indicates that the maximum recommend key size is 8MB. We haven't tested with anything that large and we'd likely have problems with keys sneaking into various data structures such as the timestamp cache, command queue and being duplicated in proposals. |
This was a significant issue with bigtable - various things had an effective memory footprint that grew with your average key size.
This is equal to the maximum number of ranges, since range IDs are int64s. There's also a much lower limit: we gossip all table descriptors and store them in a system table that is not allowed to split, so the set of all table descriptors must be able to fit in a range. This is the limiting factor for most sql schema objects. In version 1.1, we have disabled splits in the meta2 range. This limits the number of ranges that can exist in the cluster (the range descriptors must fit in one range). We plan to lift this limit in 1.2. |
Also note that I expect to increase the maximum range size in the future, which will result in increasing all of these limits. The current blocker to doing so is streaming snapshot application. |
Let's continue this discussion over in cockroachdb/docs#1908 - I addressed your comments in separate commits on top of the original text, PTAL. |
Closing to follow up in #15771 |
@knz, your last comment refers to this issue. Did you mean to refer to another issue? |
Yes sorry I meant cockroachdb/docs#1908. |
(Issue filed to reflect the known limitations from cockroachdb/docs#1381 (comment) )
Due to #15350 and #15770 (and perhaps other causes) one cannot have a single cell (= single column value in a row) larger than a range, and IIRC from previous discussions with @bdarnell much smaller than that, perhaps rangesize / 2 or lower.
However these limits are not documented.
The following questions must be answered in docs:
(This issue is more limited in scope than the next step suggested by #243, also the answers here must be given for any SQL column type, not only blobs)
The text was updated successfully, but these errors were encountered: