Skip to content

roachtest: gossip/chaos/nodes=9 failed #37118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cockroach-teamcity opened this issue Apr 25, 2019 · 7 comments
Closed

roachtest: gossip/chaos/nodes=9 failed #37118

cockroach-teamcity opened this issue Apr 25, 2019 · 7 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/99306ec3e9fcbba01c05431cbf496e8b5b8954b4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=gossip/chaos/nodes=9 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260033&tab=buildLog

The test failed on master:
	cluster.go:1349,gossip.go:117,gossip.go:125,test.go:1245: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start teamcity-1260033-gossip-chaos-nodes-9:9 returned:
		stderr:
		
		stdout:
		teamcity-1260033-gossip-chaos-nodes-9: starting........................................................................................................................
		0: exit status 255
		~ ./cockroach version
		
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:95
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func7
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:289
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1441
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1333: 
		I190425 07:42:47.437015 1 cluster_synced.go:1523  command failed
		: exit status 1
	cluster.go:1016,context.go:89,cluster.go:1005,asm_amd64.s:522,panic.go:397,test.go:790,test.go:796,cluster.go:1349,gossip.go:117,gossip.go:125,test.go:1245: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1260033-gossip-chaos-nodes-9 --oneshot --ignore-empty-nodes: exit status 1 1: 5186
		5: 5066
		2: 5435
		6: 5048
		7: 6837
		3: 4717
		4: 4759
		9: dead
		8: 5048
		Error:  9: dead

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Apr 25, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Apr 25, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dff4132a80e62c6c5ad603ff6c608b09419d4e3e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=gossip/chaos/nodes=9 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1264632&tab=buildLog

The test failed on branch=master, cloud=gce:
	gossip.go:67,gossip.go:104,gossip.go:116,gossip.go:125,test.go:1253: gossip did not stabilize in 20.2s
	cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:790,test.go:780,gossip.go:67,gossip.go:104,gossip.go:116,gossip.go:125,test.go:1253: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1264632-gossip-chaos-nodes-9 --oneshot --ignore-empty-nodes: exit status 1 8: 4481
		3: 4373
		6: 3871
		4: 3844
		5: 4569
		7: 5137
		9: 4743
		2: 4748
		1: dead
		Error:  1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/efb45869b242137e5c178b10c646c3ed025fff36

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=gossip/chaos/nodes=9 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1266041&tab=buildLog

The test failed on branch=master, cloud=gce:
	gossip.go:67,gossip.go:104,gossip.go:116,gossip.go:125,test.go:1253: gossip did not stabilize in 20.8s
	cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:790,test.go:780,gossip.go:67,gossip.go:104,gossip.go:116,gossip.go:125,test.go:1253: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1266041-gossip-chaos-nodes-9 --oneshot --ignore-empty-nodes: exit status 1 9: 4696
		8: 4181
		7: 5259
		5: 5039
		1: 4448
		2: dead
		6: 5153
		3: 3857
		4: 3918
		Error:  2: dead

@andreimatei
Copy link
Contributor

gossip.go:67,gossip.go:104,gossip.go:116,gossip.go:125,test.go:1253: gossip did not stabilize in 20.8s
@petermattis what do we do?

@petermattis
Copy link
Collaborator

petermattis commented Apr 30, 2019 via email

@tbg
Copy link
Member

tbg commented May 1, 2019

The last failure did not include #37204, so it's possible that that fixed the problem. I'd wait and see, at the past rate we should be seeing another failure within a week if the problem persists.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/84dc682eca4b11e6abaf390fc8883f32afe81fb4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=gossip/chaos/nodes=9 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1283539&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cluster.go:1400,gossip.go:117,gossip.go:125,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start teamcity-1283539-gossip-chaos-nodes-9:1 returned:
		stderr:
		
		stdout:
		D=1 && GOTRACEBACK=crash COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=1 COCKROACH_ENABLE_RPC_COMPRESSION=false ./cockroach start --insecure --store=path=/mnt/data1/cockroach --log-dir=${HOME}/logs --background --cache=25% --max-sql-memory=25% --port=26257 --http-port=26258 --locality=cloud=gce,region=us-east1,zone=us-east1-b >> ${HOME}/logs/cockroach.stdout.log 2>> ${HOME}/logs/cockroach.stderr.log || (x=$?; cat ${HOME}/logs/cockroach.stderr.log; exit $x)
		Connection to 35.237.144.245 closed by remote host.
		
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func7
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:400
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1449
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1333: 
		I190510 07:42:21.879428 1 cluster_synced.go:1531  command failed
		: exit status 1
	cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:788,test.go:794,cluster.go:1400,gossip.go:117,gossip.go:125,test.go:1251: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1283539-gossip-chaos-nodes-9 --oneshot --ignore-empty-nodes: exit status 1 7: 6545
		5: 5279
		6: 5271
		2: 4786
		9: 4553
		3: 4493
		1: dead
		8: 6402
		4: 5803
		Error:  1: dead

@tbg
Copy link
Member

tbg commented Jun 4, 2019

Some ssh flakes (cc #36929) but we didn't see the stabilization failure again, so closing.

@tbg tbg closed this as completed Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants