Skip to content

storage: CPut bytes instead of a proto when updating RangeDescriptors #38302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Jun 19, 2019

In #38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in #38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

From https://developers.google.com/protocol-buffers/docs/encoding#implications
Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches #38308
Closes #38183

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented Jun 19, 2019

Still needs some cleanup, comments, and tests but getting a CI run in. @tbg feel free to peek at the approach if you'd like

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM, left some highish-level comments/suggestions. Thanks for getting this out quickly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/storage/replica_command.go, line 1097 at r1 (raw file):

func fetchDBRangeDescriptor(
	ctx context.Context, txn *client.Txn, desc *roachpb.RangeDescriptor,

expDesc? Also why return a descriptor if it's only returned if it's equal to the one passed in? We're not going to have Equal pull any stunts, do we?

I wonder what a better name for this method is. conditionalGetDescValueFromDB? Or any other suggestion you have that reflects the fact that we're passing in a descriptor and only returning the bytes from db if we were able to match them up.


pkg/storage/replica_command.go, line 1150 at r1 (raw file):

	var ov interface{}
	if oldValue != nil {
		oldValue.ClearChecksum()

Why is this needed?

@danhhz danhhz marked this pull request as ready for review June 19, 2019 22:59
@danhhz danhhz requested a review from a team June 19, 2019 22:59
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_command.go, line 935 at r1 (raw file):

		}

		if oldDesc.RangeID != 0 && !oldDesc.Equal(desc) {

Is this safe to get rid of? I can't imagine what the new equivalent of it would be


pkg/storage/replica_command.go, line 1097 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

expDesc? Also why return a descriptor if it's only returned if it's equal to the one passed in? We're not going to have Equal pull any stunts, do we?

I wonder what a better name for this method is. conditionalGetDescValueFromDB? Or any other suggestion you have that reflects the fact that we're passing in a descriptor and only returning the bytes from db if we were able to match them up.

Done


pkg/storage/replica_command.go, line 1150 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why is this needed?

Added a comment

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/storage/replica_command.go, line 935 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

Is this safe to get rid of? I can't imagine what the new equivalent of it would be

Yeah it's fine, this was introduced debugging some really nasty bug, luckily I've since forgotten what exactly it was but it was fixed and without this ever firing I believe.


pkg/storage/replica_command_test.go, line 36 at r2 (raw file):

	ctx := context.Background()
	s, _, kvDB := serverutils.StartServer(t, base.TestServerArgs{})

Is there an option to disable the split and merge queue? Better to set them or this test has reasons to be flaky.

In cockroachdb#38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in cockroachdb#38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

    From https://developers.google.com/protocol-buffers/docs/encoding#implications
    Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches cockroachdb#38308

Release note: None
Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

bors r=tbg

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)


pkg/storage/replica_command_test.go, line 36 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Is there an option to disable the split and merge queue? Better to set them or this test has reasons to be flaky.

I don't think we need to disable the split queue, since nothing is large or getting traffic, but did them both just to be safe.

craig bot pushed a commit that referenced this pull request Jun 20, 2019
38302: storage: CPut bytes instead of a proto when updating RangeDescriptors r=tbg a=danhhz

In #38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in #38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

    From https://developers.google.com/protocol-buffers/docs/encoding#implications
    Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches #38308
Closes #38183

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 20, 2019

Build succeeded

@craig craig bot merged commit ea720e3 into cockroachdb:master Jun 20, 2019
craig bot pushed a commit that referenced this pull request Jun 26, 2019
38211: changefeedccl: remove deprecated poller r=tbg a=danhhz

We switched the default to push-based rangefeeds in 19.1. This removes
the old pull-based poller fallback entirely.

Details of the removal:
- The relevant code is removed
- Several poller-related hacks are removed
- The changefeed.run.push.enabled telemetry metric is removed
- The changefeed.push.enabled cluster setting is removed
- The poller subtest is removed from each changefeedccl test
- The cdc/poller roachtest is skipped on 19.2+
- TestValidations is removed, it's redundant with the much better
  quality TestChangefeedNemeses

Note that the table history still does some polling, but switching this
to RangeFeed will cause an unacceptable increase in the commit-to-emit
latency of rows. This bit of polling will be removed as part of #36289.
This commit also leaves the structure of the changefeed code mostly
unchanged. There is an opportunity for cleanup here, but this also will
wait for after #36289.

Closes #36914

Release note: None

38418: storage: fix null pointer dereference in AdminMerge r=jeffrey-xiao a=jeffrey-xiao

Fixes #38427.

I noticed that `TestRepartitioning` was failing under stress on master with the following error:

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x20894f6]

...

        /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).AdminMerge.func1(0xc00441e750, 0xc002a83e60, 0xc00020a880)
        /home/cockroach/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_command.go:536 +0x226

...
```

The logic for retrieving the RHS descriptor before and after #38302 is not the same in the case where `rightDesc` does not exist. This PR changes it so the behavior is consistent.

It seems possible that the following branch can evaluate true, but I might be missing something here.
```go
// Verify that the two ranges are mergeable.
if !bytes.Equal(origLeftDesc.EndKey, rightDesc.StartKey) {
  // Should never happen, but just in case.
  return errors.Errorf("ranges are not adjacent; %s != %s", origLeftDesc.EndKey, rightDesc.StartKey)
}
```


38435: sql: Fixing interleave check for loose index scan r=rohany a=rohany

Fixing the interleave check for the loose index scan while interleave support is in progress.

Co-authored-by: Daniel Harrison <[email protected]>
Co-authored-by: Jeffrey Xiao <[email protected]>
Co-authored-by: Rohan Yadav <[email protected]>
@danhhz danhhz deleted the cput_desc branch June 26, 2019 18:10
jeffrey-xiao added a commit to jeffrey-xiao/cockroach that referenced this pull request Jun 26, 2019
On a mixed version cluster, only having cockroachdb#38302 does not guarantee
backwards compatibility for non-nullable fields in RangeDescriptor.
Suppose the replicas of a range are of mixed version and the current
leaseholder of the range is on the newest version. If the leaseholder
splits range range and writes to the non-nullable fields in
RangeDescriptor and then the leaseholder transfers to a node on an older
version without cockroachdb#38302, then all CPuts would fail on that node.

We must guarantee that cockroachdb#38302 is on all nodes before making the fields
in RangeDescriptor non-nullable.

Release note: None
craig bot pushed a commit that referenced this pull request Jun 26, 2019
38465: roachpb: Change RangeDescriptor.StickyBit and ReplicaDescriptor.Type to nullable r=jeffrey-xiao a=jeffrey-xiao

On a mixed version cluster, only having #38302 does not guarantee backwards compatibility for non-nullable fields in RangeDescriptor. Suppose the replicas of a range are of mixed version and the current
leaseholder of the range is on newest version. If the leaseholder splits that range and writes to the non-nullable fields in RangeDescriptor and the leaseholder transfers to a node on an older version without #38302, then all CPuts would fail on that node.

We must guarantee that #38302 is on all nodes before making the fields in RangeDescriptor non-nullable.

Fixes #36983.
Fixes #38471.

Release note: None

Co-authored-by: Jeffrey Xiao <[email protected]>
craig bot pushed a commit that referenced this pull request Jun 27, 2019
38465: roachpb: Change RangeDescriptor.StickyBit and ReplicaDescriptor.Type to nullable r=jeffrey-xiao a=jeffrey-xiao

On a mixed version cluster, only having #38302 does not guarantee backwards compatibility for non-nullable fields in RangeDescriptor. Suppose the replicas of a range are of mixed version and the current
leaseholder of the range is on newest version. If the leaseholder splits that range and writes to the non-nullable fields in RangeDescriptor and the leaseholder transfers to a node on an older version without #38302, then all CPuts would fail on that node.

We must guarantee that #38302 is on all nodes before making the fields in RangeDescriptor non-nullable.

Fixes #36983.
Fixes #38471.

Release note: None

Co-authored-by: Jeffrey Xiao <[email protected]>
danhhz added a commit to danhhz/cockroach that referenced this pull request Aug 16, 2019
As seen in cockroachdb#38004 and cockroachdb#38147, this first came up in the context of CPut
with a proto expected value. If the serialization didn't exactly match,
the CPut would get a condition failed error, even if the two protos were
logically equivalent (missing field vs field with default value). We
were hitting this in a mixed version cluster when trying to add a
non-nullable field to RangeDescriptor, which is updated via CPut to
detect update races. The new fields ended up having to be nullable,
which has allocation consequences as well as ergonomic ones.

In cockroachdb#38302, we changed the RangeDescriptor CPut to use a *roachpb.Value
exactly as it was read from KV for the expected value, but no other
callsites were updated. To prevent future issues, this commit changes
the CPut signature to accept a *roachpb.Value for the expected value
instead of an interface{}.

The old method signature is left as CPutDeprecated for now. The CPut
usages that are trivial to switch are switched and ones that require any
thought are left for a followup PR.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 20, 2019
39714: client: force CPut callers to use a *roachpb.Value expected value r=tbg a=danhhz

As seen in #38004 and #38147, this first came up in the context of CPut
with a proto expected value. If the serialization didn't exactly match,
the CPut would get a condition failed error, even if the two protos were
logically equivalent (missing field vs field with default value). We
were hitting this in a mixed version cluster when trying to add a
non-nullable field to RangeDescriptor, which is updated via CPut to
detect update races. The new fields ended up having to be nullable,
which has allocation consequences as well as ergonomic ones.

In #38302, we changed the RangeDescriptor CPut to use a *roachpb.Value
exactly as it was read from KV for the expected value, but no other
callsites were updated. To prevent future issues, this commit changes
the CPut signature to accept a *roachpb.Value for the expected value
instead of an interface{}.

The old method signature is left as CPutDeprecated for now. The CPut
usages that are trivial to switch are switched and ones that require any
thought are left for a followup PR.

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
andreimatei added a commit to andreimatei/cockroach that referenced this pull request May 14, 2020
The "below raft protos" testing was preventing the RangeDesc encoding to
change because we used to CPut RangeDescs with unmarshalled/marshalled
expected bytes - which required a stable encoding. We've since fixed
that in cockroachdb#38302, so the testing was un-necessary.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request May 15, 2020
This field used to be nullable because the proto's encoding of old
protos without the field had to be stable when unmarshalled/marshalled.
Since cockroachdb#38302, that's no longer needed, and so the field can now be as
nature intended.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request May 15, 2020
The "below raft protos" testing was preventing the RangeDesc encoding to
change because we used to CPut RangeDescs with unmarshalled/marshalled
expected bytes - which required a stable encoding. We've since fixed
that in cockroachdb#38302, so the testing was un-necessary.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request May 15, 2020
This field used to be nullable because the proto's encoding of old
protos without the field had to be stable when unmarshalled/marshalled.
Since cockroachdb#38302, that's no longer needed, and so the field can now be as
nature intended.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: endless spam from split queue retries
3 participants