-
Notifications
You must be signed in to change notification settings - Fork 3.9k
roachtest: tpcc/headroom/n4cpu16 failed #37163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like towards the end of the test the QPS for some types of transactions dropped to 0. And the we also failed to download the debug.zip. |
No I haven't, but I wouldn't be surprised if this is related to what we're seeing in #37199. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The failures over the past 3 days are because of #37590. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Previous three issues addressed by #37701. |
This comment has been minimized.
This comment has been minimized.
Latest failure addressed by #37726. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306272&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308285&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308281&tab=buildLog
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Minimized comments above addressed by #38022. |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324173&tab=buildLog
|
This was missed during cockroachdb#37726. Closes cockroachdb#37488. Touches cockroachdb#37163. Release note: None
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1330352&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1335643&tab=buildLog
|
The last two failures (on release-19.1 both) show the
and then nothing for 6 minutes and then
And then tones of ctx canceled and network errors. The other nodes have network errors starting around 06:03. |
From the 6:01 goroutine dump at n1
There's also a heap profile (on n1) at 6:03 that shows a lots of memory tied up in This line
is actually pretty interesting because it's supposed to only ever print times that are close to ~10s: cockroach/pkg/storage/replica_write.go Lines 204 to 206 in 7e2ceae
cockroach/pkg/storage/replica_write.go Lines 176 to 179 in 7e2ceae
The fact that it took more than 30x that does suggest that something horrible happened to the machine. The CPU seems to be working pretty hard in the minutes leading up to the failure
"1513.2% utime" basically means the 16 cpus are maxing out. But still, I think this is already part of the downfall here, and I can't imagine overloading a machine to the point where lots of goroutines don't get scheduled for 300s. There's nothing in dmesg, but I found this in sysctl:
I tried googling for that message but with zero success. The kernel here is
|
^- anyway, this udev message doesn't seem like something we could trigger with a bug in crdb. Maybe the vm was "live" migrated? |
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1344398&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1371441&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1396096&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399000&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1402541&tab=buildLog
|
Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1404886&tab=buildLog
|
These recent failures have been identified as #39103. |
SHA: https://github.com/cockroachdb/cockroach/commits/a53852a8f6c02ca5573a22abc03c790326ef69ba
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1262103&tab=buildLog
The text was updated successfully, but these errors were encountered: