-
Notifications
You must be signed in to change notification settings - Fork 3.9k
distsqlrun: Ignore breaker when outbox dials node #40691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Let me know if this approach makes sense, and the best way to test a change like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @rohany, and @tbg)
pkg/rpc/nodedialer/nodedialer.go, line 95 at r1 (raw file):
// DialNoBreaker ignores the breaker if there is an error dialing. This function // should only be used when there is good reason to believe that the node is reachable. func (n *Dialer) DialNoBreaker(
it seems like this is implemented the same as the above function apart from the breaker stuff. to keep the code DRY, would it make sense to have Dial
and DialNoBreaker
call out to the same helper function? the only line we would want to skip in the latter case is breaker.Fail(err)
i think, so the helper function could just accept a flag for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @rafiss, and @tbg)
pkg/rpc/nodedialer/nodedialer.go, line 95 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
it seems like this is implemented the same as the above function apart from the breaker stuff. to keep the code DRY, would it make sense to have
Dial
andDialNoBreaker
call out to the same helper function? the only line we would want to skip in the latter case isbreaker.Fail(err)
i think, so the helper function could just accept a flag for that.
theres some other breaker related stuff happening in unexported node.dial
that we want to avoid too. However, it turned out to be cleaner there than I thought than to just handle a nil breaker.
f354516
to
888452c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my perspective, might be good to add a test
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @rafiss, @rohany, and @tbg)
pkg/rpc/nodedialer/nodedialer.go, line 193 at r2 (raw file):
// as a stop-gap before the reconciliation occurs. if breaker != nil { breaker.Success()
Maybe make these nil checks part of the Fail
and Success
methods?
pkg/sql/distsqlrun/outbox.go, line 226 at r2 (raw file):
// a critical part of query execution: if this step doesn't work, the // receiving side might end up hanging or timing out. // TODO(asubiotto): We should retry a failed Dial. This rests on the
I think we can remove this TODO, if we attempt to connect to a node that was healthy at plan time and it fails even while ignoring the breaker, that's probably a good enough reason to exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @rafiss, and @tbg)
pkg/rpc/nodedialer/nodedialer.go, line 193 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Maybe make these nil checks part of the
Fail
andSuccess
methods?
I think it is more explicit this way that we are ignoring the breaker.
pkg/sql/distsqlrun/outbox.go, line 226 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I think we can remove this TODO, if we attempt to connect to a node that was healthy at plan time and it fails even while ignoring the breaker, that's probably a good enough reason to exit.
Done.
615b2bd
to
db7caab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm too, and i agree about adding a test if you can
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @rafiss, and @tbg)
db7caab
to
5ed2cf0
Compare
PTAL -- i added a test for DialNoBreaker. I was originally confused about testing this because I thought we would need to recreate the situation in the issue. |
Jk, the test passed locally but not under stress. Let me try again. |
5ed2cf0
to
2d39d61
Compare
Alright, thanks to some help from @ajwerner (tysm) I was able to test this. |
Fixes cockroachdb#38602. Release note: None
2d39d61
to
66683d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test looks great!
Reviewed 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @rafiss, and @tbg)
bors r+ |
Build failed |
I think this is a test flake. Retrying. bors r+ |
Build failed |
The flake should be resolved by #40757 |
run it back -- some other PR's got in bors r+ |
40691: distsqlrun: Ignore breaker when outbox dials node r=rohany a=rohany Fixes #38602. Release justification: Fixes an outstanding bug. Release note: None Co-authored-by: Rohan Yadav <[email protected]>
Build succeeded |
Fixes #38602.
Release justification: Fixes an outstanding bug.
Release note: None