-
Notifications
You must be signed in to change notification settings - Fork 3.9k
storage: CPut bytes instead of a proto when updating RangeDescriptors #38302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Still needs some cleanup, comments, and tests but getting a CI run in. @tbg feel free to peek at the approach if you'd like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach LGTM, left some highish-level comments/suggestions. Thanks for getting this out quickly.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/storage/replica_command.go, line 1097 at r1 (raw file):
func fetchDBRangeDescriptor( ctx context.Context, txn *client.Txn, desc *roachpb.RangeDescriptor,
expDesc
? Also why return a descriptor if it's only returned if it's equal to the one passed in? We're not going to have Equal
pull any stunts, do we?
I wonder what a better name for this method is. conditionalGetDescValueFromDB
? Or any other suggestion you have that reflects the fact that we're passing in a descriptor and only returning the bytes from db if we were able to match them up.
pkg/storage/replica_command.go, line 1150 at r1 (raw file):
var ov interface{} if oldValue != nil { oldValue.ClearChecksum()
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/storage/replica_command.go, line 935 at r1 (raw file):
} if oldDesc.RangeID != 0 && !oldDesc.Equal(desc) {
Is this safe to get rid of? I can't imagine what the new equivalent of it would be
pkg/storage/replica_command.go, line 1097 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
expDesc
? Also why return a descriptor if it's only returned if it's equal to the one passed in? We're not going to haveEqual
pull any stunts, do we?I wonder what a better name for this method is.
conditionalGetDescValueFromDB
? Or any other suggestion you have that reflects the fact that we're passing in a descriptor and only returning the bytes from db if we were able to match them up.
Done
pkg/storage/replica_command.go, line 1150 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Why is this needed?
Added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/storage/replica_command.go, line 935 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Is this safe to get rid of? I can't imagine what the new equivalent of it would be
Yeah it's fine, this was introduced debugging some really nasty bug, luckily I've since forgotten what exactly it was but it was fixed and without this ever firing I believe.
pkg/storage/replica_command_test.go, line 36 at r2 (raw file):
ctx := context.Background() s, _, kvDB := serverutils.StartServer(t, base.TestServerArgs{})
Is there an option to disable the split and merge queue? Better to set them or this test has reasons to be flaky.
In cockroachdb#38147, a new non-nullable field was added to the ReplicaDescriptor proto that is serialized inside RangeDescriptors. RangeDescriptors are updated using CPut to detect races. This means that if a RangeDescriptor had been written by an old version of cockroach and we then attempt to update it, the CPut will fail because the encoded version of it is different (non-nullable proto2 fields are always included). A similar issue was introduced in cockroachdb#38004 which made the StickyBit field on RangeDescriptor non-nullable. We could keep the fields as nullable, but this has allocation costs (and is also more annoying to work with). Worse, the proto spec is pretty explicit about not relying on serialization of a given message to always produce exactly the same bytes: From https://developers.google.com/protocol-buffers/docs/encoding#implications Do not assume the byte output of a serialized message is stable. So instead, we've decided to stop CPut-ing protos and to change the interface of CPut to take the expected value as bytes. To CPut a proto, the encoded value will be read from kv, passed around with the decoded struct, and used in the eventual CPut. This work is upcoming, but the above two PRs are real breakages, so this commit fixes those first. Neither of these PRs made the last alpha so no release note. Touches cockroachdb#38308 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
bors r=tbg
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)
pkg/storage/replica_command_test.go, line 36 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is there an option to disable the split and merge queue? Better to set them or this test has reasons to be flaky.
I don't think we need to disable the split queue, since nothing is large or getting traffic, but did them both just to be safe.
38302: storage: CPut bytes instead of a proto when updating RangeDescriptors r=tbg a=danhhz In #38147, a new non-nullable field was added to the ReplicaDescriptor proto that is serialized inside RangeDescriptors. RangeDescriptors are updated using CPut to detect races. This means that if a RangeDescriptor had been written by an old version of cockroach and we then attempt to update it, the CPut will fail because the encoded version of it is different (non-nullable proto2 fields are always included). A similar issue was introduced in #38004 which made the StickyBit field on RangeDescriptor non-nullable. We could keep the fields as nullable, but this has allocation costs (and is also more annoying to work with). Worse, the proto spec is pretty explicit about not relying on serialization of a given message to always produce exactly the same bytes: From https://developers.google.com/protocol-buffers/docs/encoding#implications Do not assume the byte output of a serialized message is stable. So instead, we've decided to stop CPut-ing protos and to change the interface of CPut to take the expected value as bytes. To CPut a proto, the encoded value will be read from kv, passed around with the decoded struct, and used in the eventual CPut. This work is upcoming, but the above two PRs are real breakages, so this commit fixes those first. Neither of these PRs made the last alpha so no release note. Touches #38308 Closes #38183 Release note: None Co-authored-by: Daniel Harrison <[email protected]>
Build succeeded |
38211: changefeedccl: remove deprecated poller r=tbg a=danhhz We switched the default to push-based rangefeeds in 19.1. This removes the old pull-based poller fallback entirely. Details of the removal: - The relevant code is removed - Several poller-related hacks are removed - The changefeed.run.push.enabled telemetry metric is removed - The changefeed.push.enabled cluster setting is removed - The poller subtest is removed from each changefeedccl test - The cdc/poller roachtest is skipped on 19.2+ - TestValidations is removed, it's redundant with the much better quality TestChangefeedNemeses Note that the table history still does some polling, but switching this to RangeFeed will cause an unacceptable increase in the commit-to-emit latency of rows. This bit of polling will be removed as part of #36289. This commit also leaves the structure of the changefeed code mostly unchanged. There is an opportunity for cleanup here, but this also will wait for after #36289. Closes #36914 Release note: None 38418: storage: fix null pointer dereference in AdminMerge r=jeffrey-xiao a=jeffrey-xiao Fixes #38427. I noticed that `TestRepartitioning` was failing under stress on master with the following error: ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x20894f6] ... /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 github.com/cockroachdb/cockroach/pkg/storage.(*Replica).AdminMerge.func1(0xc00441e750, 0xc002a83e60, 0xc00020a880) /home/cockroach/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:536 +0x226 ... ``` The logic for retrieving the RHS descriptor before and after #38302 is not the same in the case where `rightDesc` does not exist. This PR changes it so the behavior is consistent. It seems possible that the following branch can evaluate true, but I might be missing something here. ```go // Verify that the two ranges are mergeable. if !bytes.Equal(origLeftDesc.EndKey, rightDesc.StartKey) { // Should never happen, but just in case. return errors.Errorf("ranges are not adjacent; %s != %s", origLeftDesc.EndKey, rightDesc.StartKey) } ``` 38435: sql: Fixing interleave check for loose index scan r=rohany a=rohany Fixing the interleave check for the loose index scan while interleave support is in progress. Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Jeffrey Xiao <[email protected]> Co-authored-by: Rohan Yadav <[email protected]>
On a mixed version cluster, only having cockroachdb#38302 does not guarantee backwards compatibility for non-nullable fields in RangeDescriptor. Suppose the replicas of a range are of mixed version and the current leaseholder of the range is on the newest version. If the leaseholder splits range range and writes to the non-nullable fields in RangeDescriptor and then the leaseholder transfers to a node on an older version without cockroachdb#38302, then all CPuts would fail on that node. We must guarantee that cockroachdb#38302 is on all nodes before making the fields in RangeDescriptor non-nullable. Release note: None
38465: roachpb: Change RangeDescriptor.StickyBit and ReplicaDescriptor.Type to nullable r=jeffrey-xiao a=jeffrey-xiao On a mixed version cluster, only having #38302 does not guarantee backwards compatibility for non-nullable fields in RangeDescriptor. Suppose the replicas of a range are of mixed version and the current leaseholder of the range is on newest version. If the leaseholder splits that range and writes to the non-nullable fields in RangeDescriptor and the leaseholder transfers to a node on an older version without #38302, then all CPuts would fail on that node. We must guarantee that #38302 is on all nodes before making the fields in RangeDescriptor non-nullable. Fixes #36983. Fixes #38471. Release note: None Co-authored-by: Jeffrey Xiao <[email protected]>
38465: roachpb: Change RangeDescriptor.StickyBit and ReplicaDescriptor.Type to nullable r=jeffrey-xiao a=jeffrey-xiao On a mixed version cluster, only having #38302 does not guarantee backwards compatibility for non-nullable fields in RangeDescriptor. Suppose the replicas of a range are of mixed version and the current leaseholder of the range is on newest version. If the leaseholder splits that range and writes to the non-nullable fields in RangeDescriptor and the leaseholder transfers to a node on an older version without #38302, then all CPuts would fail on that node. We must guarantee that #38302 is on all nodes before making the fields in RangeDescriptor non-nullable. Fixes #36983. Fixes #38471. Release note: None Co-authored-by: Jeffrey Xiao <[email protected]>
As seen in cockroachdb#38004 and cockroachdb#38147, this first came up in the context of CPut with a proto expected value. If the serialization didn't exactly match, the CPut would get a condition failed error, even if the two protos were logically equivalent (missing field vs field with default value). We were hitting this in a mixed version cluster when trying to add a non-nullable field to RangeDescriptor, which is updated via CPut to detect update races. The new fields ended up having to be nullable, which has allocation consequences as well as ergonomic ones. In cockroachdb#38302, we changed the RangeDescriptor CPut to use a *roachpb.Value exactly as it was read from KV for the expected value, but no other callsites were updated. To prevent future issues, this commit changes the CPut signature to accept a *roachpb.Value for the expected value instead of an interface{}. The old method signature is left as CPutDeprecated for now. The CPut usages that are trivial to switch are switched and ones that require any thought are left for a followup PR. Release note: None
39714: client: force CPut callers to use a *roachpb.Value expected value r=tbg a=danhhz As seen in #38004 and #38147, this first came up in the context of CPut with a proto expected value. If the serialization didn't exactly match, the CPut would get a condition failed error, even if the two protos were logically equivalent (missing field vs field with default value). We were hitting this in a mixed version cluster when trying to add a non-nullable field to RangeDescriptor, which is updated via CPut to detect update races. The new fields ended up having to be nullable, which has allocation consequences as well as ergonomic ones. In #38302, we changed the RangeDescriptor CPut to use a *roachpb.Value exactly as it was read from KV for the expected value, but no other callsites were updated. To prevent future issues, this commit changes the CPut signature to accept a *roachpb.Value for the expected value instead of an interface{}. The old method signature is left as CPutDeprecated for now. The CPut usages that are trivial to switch are switched and ones that require any thought are left for a followup PR. Release note: None Co-authored-by: Daniel Harrison <[email protected]>
The "below raft protos" testing was preventing the RangeDesc encoding to change because we used to CPut RangeDescs with unmarshalled/marshalled expected bytes - which required a stable encoding. We've since fixed that in cockroachdb#38302, so the testing was un-necessary. Release note: None
This field used to be nullable because the proto's encoding of old protos without the field had to be stable when unmarshalled/marshalled. Since cockroachdb#38302, that's no longer needed, and so the field can now be as nature intended. Release note: None
The "below raft protos" testing was preventing the RangeDesc encoding to change because we used to CPut RangeDescs with unmarshalled/marshalled expected bytes - which required a stable encoding. We've since fixed that in cockroachdb#38302, so the testing was un-necessary. Release note: None
This field used to be nullable because the proto's encoding of old protos without the field had to be stable when unmarshalled/marshalled. Since cockroachdb#38302, that's no longer needed, and so the field can now be as nature intended. Release note: None
In #38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).
A similar issue was introduced in #38004 which made the StickyBit field
on RangeDescriptor non-nullable.
We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).
Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:
So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.
Neither of these PRs made the last alpha so no release note.
Touches #38308
Closes #38183
Release note: None