Skip to content

storage: fix potential panic tracing raft requests #37107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ajwerner
Copy link
Contributor

Before this PR, when using vmodule=raft >= 2, it was possible for concurrent
proposal cancelation to lead to span being closed before logging an event.
This PR adds a contract to ProposalData.ctx that after a proposal has been
submitted to raft it can only be modified while holding the Replica.raftMu.
This is acceptable because cancelation is not performance sensitive.

The test reliably reproduced the panic when run under stress within a minute
and has been run for 10 minutes after the change without a failure.

@ajwerner ajwerner requested review from nvanbenschoten and a team April 25, 2019 00:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/raft-mu-for-proposal-ctx-in-try-abandon branch from f7e6061 to 40aa71b Compare April 25, 2019 01:28
Before this PR, when using vmodule=raft >= 2, it was possible for concurrent
proposal cancelation to lead to span being closed before logging an event.
This PR adds a contract to ProposalData.ctx that after a proposal has been
submitted to raft it can only be modified while holding the Replica.raftMu.
This is acceptable because cancelation is not performance sensitive.

The test reliably reproduced the panic when run under stress within a minute
and has been run for 10 minutes after the change without a failure.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/raft-mu-for-proposal-ctx-in-try-abandon branch from 40aa71b to 4da0e1a Compare April 25, 2019 03:05
@ajwerner
Copy link
Contributor Author

ajwerner commented May 6, 2019

@nvanbenschoten gentle nudge. I also would like to backport this to 19.1. It's only possible with vmodule set to raft=2 or higher but it still seems worth fixing.

Copy link
Contributor

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: sorry for letting this sit.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@ajwerner
Copy link
Contributor Author

ajwerner commented May 7, 2019

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request May 7, 2019
37107: storage: fix potential panic tracing raft requests r=nvanbenschoten a=ajwerner

Before this PR, when using vmodule=raft >= 2, it was possible for concurrent
proposal cancelation to lead to span being closed before logging an event.
This PR adds a contract to ProposalData.ctx that after a proposal has been
submitted to raft it can only be modified while holding the Replica.raftMu.
This is acceptable because cancelation is not performance sensitive.

The test reliably reproduced the panic when run under stress within a minute
and has been run for 10 minutes after the change without a failure.

Co-authored-by: Andrew Werner <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 7, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants