Skip to content

panic when a Context is cancelled #614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
blinsay opened this issue May 22, 2017 · 20 comments
Closed

panic when a Context is cancelled #614

blinsay opened this issue May 22, 2017 · 20 comments

Comments

@blinsay
Copy link

blinsay commented May 22, 2017

I've been using SELECT FOR UPDATE in a transaction and wanted to put a timeout on the transaction so that callers wouldn't get blocked. I've been seeing a runtime error: slice bounds out of range panic in rows.Next that smells like some invalid data being returned.

2017/05/22 15:27:49 http: panic serving [::1]:57542: runtime error: slice bounds out of range
goroutine 52 [running]:
net/http.(*conn).serve.func1(0xc4201c2960)
	/usr/local/go/src/net/http/server.go:1721 +0xd0
panic(0x17537a0, 0x1c2f970)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/path/to/project/vendor/github.com/lib/pq.(*conn).errRecover(0xc4200d6b00, 0xc42013f188)
	/home/go/src/github.com/path/to/project/vendor/github.com/lib/pq/error.go:482 +0x505
panic(0x17537a0, 0x1c2f970)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/path/to/project/vendor/github.com/lib/pq.(*rows).Next(0xc42054a0a0, 0xc42054a280, 0xa, 0xa, 0x0, 0x0)
	/home/go/src/github.com/path/to/project/vendor/github.com/lib/pq/conn.go:1403 +0x5e9
database/sql.(*Rows).nextLocked(0xc42054c100, 0xc42013f1e0)
	/usr/local/go/src/database/sql/sql.go:2149 +0x6c
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2134 +0x3c
database/sql.withLock(0x1c04f80, 0xc42054c130, 0xc42013f220)
	/usr/local/go/src/database/sql/sql.go:2545 +0x65
database/sql.(*Rows).Next(0xc42054c100, 0x18a39e0)
	/usr/local/go/src/database/sql/sql.go:2135 +0x83

The panic starts here and only occurs if I get the context to cancel at exaaactly the right time. Adding print statements or changing my context timeout changes how reliably I can reproduce it.

I'm running against Postgres 9.6.2 and have lib/pq at 2704adc.

@blinsay
Copy link
Author

blinsay commented May 22, 2017

I jammed a fmt.Println right above the line where the panic starts in conn.go and see l = 3410, len(rs.rb) = 222.

@felixge
Copy link

felixge commented Jun 8, 2017

EDIT: please ignore this comment, see comment below.

We're seeing a similar issue. Our code basically looks like this:

tx, err := db.BeginTx(ctx, nil)
if err != nil {
  return err
}
defer tx.Rollback()

rows, err := tx.QueryContext(ctx, sql, args...)
// do stuff with rows ...

And we see this panic:

n=8 len=512
n=1 len=512
n=-4 len=512
n=0 len=512
panic: runtime error: slice bounds out of range [recovered]
	panic: runtime error: slice bounds out of range

goroutine 29 [running]:
example/vendor/github.com/lib/pq.(*conn).errRecover(0xc420013600, 0xc4201cfef8)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/error.go:482 +0x505
panic(0x13c4720, 0x161df90)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
example/vendor/github.com/lib/pq.(*conn).recvMessage(0xc420013600, 0xc4202a0960, 0x13a72c0, 0x12c9101, 0x1644d58)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/conn.go:950 +0x386
example/vendor/github.com/lib/pq.(*conn).recv1Buf(0xc420013600, 0xc4202a0960, 0x0)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/conn.go:990 +0x39
example/vendor/github.com/lib/pq.(*conn).recv1(0xc420013600, 0xc420197e38, 0x9)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/conn.go:1011 +0x85
example/vendor/github.com/lib/pq.(*conn).simpleExec(0xc420013600, 0x1439ffb, 0x8, 0xc420197ef8, 0xc420019098, 0xc420197ec8, 0x101251d, 0xc420019098, 0x15f8008)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/conn.go:609 +0x1d4
example/vendor/github.com/lib/pq.(*conn).Rollback(0xc420013600, 0x0, 0x0)
	/Users/felixge/code/go/src/example/vendor/github.com/lib/pq/conn.go:584 +0xfa
database/sql.(*Tx).rollback.func1()
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1526 +0x3c
database/sql.withLock(0x15f7fe0, 0xc42026a070, 0xc420197f80)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2545 +0x65
database/sql.(*Tx).rollback(0xc420438000, 0xc420166001, 0x0, 0x105a2e1)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1527 +0x96
database/sql.(*Tx).awaitDone(0xc420438000)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1446 +0x6a
created by database/sql.(*DB).begin
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1383 +0x274
exit status 2
FAIL	example	1.800s

Specifically the n=-4 len=512 line is important. If you're wondering where this is coming from, it's a debug statement I put into lib/pq to see what's going on:

diff --git a/vendor/github.com/lib/pq/conn.go b/vendor/github.com/lib/pq/conn.go
index 3b322bc..398eca1 100644
--- a/vendor/github.com/lib/pq/conn.go
+++ b/vendor/github.com/lib/pq/conn.go
@@ -946,6 +946,7 @@ func (cn *conn) recvMessage(r *readBuf) (byte, error) {
        n := int(binary.BigEndian.Uint32(x[1:])) - 4
        var y []byte
        if n <= len(cn.scratch) {
+               fmt.Printf("n=%d len=%d\n", n, len(cn.scratch))
                y = cn.scratch[:n]
        } else {
                y = make([]byte, n)

When I prefix my example code above with ctx = context.Background() everything is working as expected.

Hopefully this information will be useful!

If I find myself with a bit more time, I'll also try to contribute a patch, but I'm not sure if this will happen at any point soon.

@felixge
Copy link

felixge commented Jun 8, 2017

Sorry, the above was a false alarm. I noticed I forgot the following in one place 😢

defer rows.Close()

@blinsay
Copy link
Author

blinsay commented Jul 11, 2017

i ran into this again. the stack trace seems to alternate between the one I posted about (pointing at conn.go:1403) and the one that @felixge posted.

testing.tRunner.func1(0xc42044e270)
	/usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:622 +0x29d
panic(0x14b3360, 0x1786e50)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
project/vendor/github.com/lib/pq.(*conn).errRecover(0xc42025c580, 0xc4202e3a68)
	/path/to/project/vendor/github.com/lib/pq/error.go:482 +0x505
panic(0x14b3360, 0x1786e50)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
bufio.(*Reader).Read(0xc4202028a0, 0xc42025c5a0, 0x5, 0x200, 0x1355e54, 0x17604a0, 0xc4202e3878)
	/usr/local/Cellar/go/1.8.3/libexec/src/bufio/bufio.go:224 +0x4c1
io.ReadAtLeast(0x175bb20, 0xc4202028a0, 0xc42025c5a0, 0x5, 0x200, 0x5, 0x0, 0xc420127ea6, 0x152f388)
	/usr/local/Cellar/go/1.8.3/libexec/src/io/io.go:307 +0xa9
io.ReadFull(0x175bb20, 0xc4202028a0, 0xc42025c5a0, 0x5, 0x200, 0x3, 0x3, 0x3)
	/usr/local/Cellar/go/1.8.3/libexec/src/io/io.go:325 +0x58
project/vendor/github.com/lib/pq.(*conn).recvMessage(0xc42025c580, 0xc42027ce20, 0x5, 0x152f388, 0x5)
	/path/to/project/vendor/github.com/lib/pq/conn.go:935 +0x13e
project/vendor/github.com/lib/pq.(*conn).recv1Buf(0xc42025c580, 0xc42027ce20, 0x3)
	/path/to/project/vendor/github.com/lib/pq/conn.go:985 +0x39
project/vendor/github.com/lib/pq.(*rows).Next(0xc42027cdc0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/path/to/project/vendor/github.com/lib/pq/conn.go:1372 +0xdd
project/vendor/github.com/lib/pq.(*rows).Close(0xc42027cdc0, 0x0, 0x0)
	/path/to/project/vendor/github.com/lib/pq/conn.go:1334 +0x69
database/sql.(*Rows).close(0xc420200180, 0x0, 0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2434 +0xf3
database/sql.(*Rows).Close(0xc420200180, 0xc4202e3de8, 0xa)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2418 +0x3d
database/sql.(*Row).Scan(0xc4201b50e0, 0xc4202e3de8, 0xa, 0xa, 0x0, 0x0)
	/usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2497 +0x18b%

the calling code looks something like this.

func DoStuff(arg string) error {
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()

	tx, err := p.db.BeginTx(ctx, nil)
	if err != nil {
		return err
	}

	var thing MyStruct
	if err := tx.Stmt(p.selectForUpdate).QueryRow(sessionID).Scan(&thing.Foo, &thing.Bar, ...., &thing.Baz); err != nil {
           if err != sql.ErrTxDone {
             if rollbackErr := db.Rollback(); rollbackErr != nil {
                log.Printf("warning: rollback failed! %s", rollbackErr)
             }
           }
           return err
        }

   // more here ....
}

by tuning the timeout, i can get the errors to reliably happen in the first QueryRow().Scan call.

i spent a while trying to replicate on a very simple test case using pg_sleep and varying the timeout, but no dice. maybe the number of parameters is related? there are 10 being scanned here.

@maddyblue
Copy link
Collaborator

I attempted to reproduce your bug but was not able to. Here's what I tried:

func TestIssue614(t *testing.T) {
	db := openTestConn(t)
	defer db.Close()

	stmt, err := db.Prepare("select pg_sleep($1)")
	if err != nil {
		t.Fatal(err)
	}

	for timeout := time.Duration(0); timeout < time.Second*4; timeout += time.Second {
		for sleep := 0; sleep < 4; sleep++ {
			t.Run(fmt.Sprintf("%v-%v", timeout, sleep), func(t *testing.T) {
				ctx, cancel := context.WithTimeout(context.Background(), timeout)
				defer cancel()

				tx, err := db.BeginTx(ctx, nil)
				if err != nil {
					t.Fatal(err)
				}
				defer tx.Rollback()

				var i interface{}
				if err := tx.Stmt(stmt).QueryRow(sleep).Scan(&i); err != nil {
					t.Fatal(err)
				}
			})
		}
	}
}

Some of these error (as expected), but none of them panic'd. Until we can get a reproducible test case, it's hard to know if this is your bug (race condition, multiple go routines using the same connection, not closing something) or a lib/pq bug.

@blinsay
Copy link
Author

blinsay commented Jul 11, 2017

@mjibson thanks for responding!

Some of these error (as expected), but none of them panic'd.

I've got a similar attempt at a repro here and also couldn't get a panic to occur.

Until we can get a reproducible test case, it's hard to know if this is your bug (race condition, multiple go routines using the same connection, not closing something) or a lib/pq bug

totally understood.

the panic-ing code is using prepared statements on a sql.DB obtained through database/sql.Open. I think all of that is intended to be safe for use from multiple goroutines. the panic is happening on the first call to QueryRow().Scan so there isn't much room for me to make a close-related error.

this feels real subtle.

@blinsay
Copy link
Author

blinsay commented Jul 11, 2017

i've got a new and different panic trace. it points into bufio so whatever is going on definitely feels like a race to me.

panic: runtime error: slice bounds out of range [recovered]
	panic: runtime error: slice bounds out of range [recovered]
	panic: runtime error: slice bounds out of range

goroutine 100 [running]:
testing.tRunner.func1(0xc4203ca000)
	/usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:622 +0x29d
panic(0x14b3260, 0x1786e50)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
project/vendor/github.com/lib/pq.(*conn).errRecover(0xc420366000, 0xc420385a58)
	/path/to/project/vendor/github.com/lib/pq/error.go:482 +0x505
panic(0x14b3260, 0x1786e50)
	/usr/local/Cellar/go/1.8.3/libexec/src/runtime/panic.go:489 +0x2cf
bufio.(*Reader).Read(0xc420346540, 0xc420366021, 0x4, 0x1ff, 0x1, 0x0, 0x0)
	/usr/local/Cellar/go/1.8.3/libexec/src/bufio/bufio.go:224 +0x4c1
io.ReadAtLeast(0x175bb20, 0xc420346540, 0xc420366020, 0x5, 0x200, 0x5, 0x0, 0xc4200184fc, 0x152f288)
	/usr/local/Cellar/go/1.8.3/libexec/src/io/io.go:307 +0xa9
io.ReadFull(0x175bb20, 0xc420346540, 0xc420366020, 0x5, 0x200, 0x1, 0x1, 0x1)
	/usr/local/Cellar/go/1.8.3/libexec/src/io/io.go:325 +0x58
project/vendor/github.com/lib/pq.(*conn).recvMessage(0xc420366000, 0xc4202f8420, 0x5, 0x152f288, 0x5)
	/path/to/project/vendor/github.com/lib/pq/conn.go:935 +0x13e
project/vendor/github.com/lib/pq.(*conn).recv1Buf(0xc420366000, 0xc4202f8420, 0x1)
	/path/to/project/vendor/github.com/lib/pq/conn.go:985 +0x39%

@blinsay
Copy link
Author

blinsay commented Jul 11, 2017

@mjibson go test -race to the rescue! this feels like a pretty actionable piece of data.

I would guess neither of us could repro with a simple example because we didn't explicitly use a Tx. I'm going to sit down tomorrow and see if I can make it happen.

WARNING: DATA RACE
Write at 0x00c4200c0d20 by goroutine 30:
  project/vendor/github.com/lib/pq.(*rows).Next()
      /path/to/project/vendor/github.com/lib/pq/conn.go:1385 +0xf00
  database/sql.(*Rows).nextLocked()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2149 +0xd1
  database/sql.(*Rows).Next.func1()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2134 +0x57
  database/sql.withLock()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2545 +0x76
  database/sql.(*Rows).Next()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2135 +0xe8
  project/vendor/github.com/jmoiron/sqlx.(*Row).Scan()
      /path/to/project/vendor/github.com/jmoiron/sqlx/sqlx.go:205 +0x1b2
  project/vendor/github.com/jmoiron/sqlx.(*Row).scanAny()
      /path/to/project/vendor/github.com/jmoiron/sqlx/sqlx.go:775 +0x58f
  project/vendor/github.com/jmoiron/sqlx.Get()
      /path/to/project/vendor/github.com/jmoiron/sqlx/sqlx.go:676 +0xa7
  project/vendor/github.com/jmoiron/sqlx.(*Stmt).Get()
      /path/to/project/vendor/github.com/jmoiron/sqlx/sqlx.go:513 +0xdc
  project/api/db.TestSomething()
      /path/to/project/api/db/something_test.go:311 +0x348
  testing.tRunner()
      /usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:657 +0x107

Previous write at 0x00c4200c0d20 by goroutine 40:
  project/vendor/github.com/lib/pq.(*conn).simpleExec()
      /path/to/project/vendor/github.com/lib/pq/conn.go:610 +0x5a5
  project/vendor/github.com/lib/pq.(*conn).Rollback()
      /path/to/project/vendor/github.com/lib/pq/conn.go:580 +0x135
  database/sql.(*Tx).rollback.func1()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1526 +0x6e
  database/sql.withLock()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:2545 +0x76
  database/sql.(*Tx).rollback()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1527 +0x10a
  database/sql.(*Tx).awaitDone()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1446 +0x8d

Goroutine 30 (running) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:697 +0x543
  project/api/db.TestSomething()
      /path/to/project/api/db/remote_session_repo_test.go:30 +0x75
  testing.tRunner()
      /usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:657 +0x107

Goroutine 40 (running) created at:
  database/sql.(*DB).begin()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1383 +0x38f
  database/sql.(*DB).BeginTx()
      /usr/local/Cellar/go/1.8.3/libexec/src/database/sql/sql.go:1342 +0x89
  project/vendor/github.com/jmoiron/sqlx.(*DB).BeginTxx()
      /path/to/project/vendor/github.com/jmoiron/sqlx/sqlx_context.go:204 +0x77
  project/api/db.(*something).Something()
      /path/to/project/api/db/something.go:217 +0x182
  project/api/db.TestSomething()
      /path/to/project/api/db/something_test.go:311 +0x348
  testing.tRunner()
      /usr/local/Cellar/go/1.8.3/libexec/src/testing/testing.go:657 +0x107%

(I doctored the trace a little bit to not have employer info posted publicly. If that's an issue, let me know).

@blinsay
Copy link
Author

blinsay commented Jul 12, 2017

@mjibson i had time to sit down and fix my repro. I can reliably reproduce the panic running this with go run -race repro.go -reps 5 -timeout-ms 100 -jitter-ms 5. it panics > 50% of the time and the race detector complains 100% of the time.

https://gist.github.com/blinsay/044f39176eeab7f3020d1b527df5bdd7

the race detector seems to be pointing to db.BeginTx in the stdlib where tx.awaitDone() calls tx.rollback(true) in a new goroutine. rollback eventually calls driver.Rollback here. that ends up in cn.simpleExec which doesn't seem to be safe to call from multiple goroutines.

@chrispassas
Copy link

I am also experiencing this bug.

I'm using BeginTx, in my case I it panics every time. Even using recover() it still crashes my process.

@johto
Copy link
Contributor

johto commented Aug 29, 2017

the race detector seems to be pointing to db.BeginTx in the stdlib where tx.awaitDone() calls tx.rollback(true) in a new goroutine. rollback eventually calls driver.Rollback here. that ends up in cn.simpleExec which doesn't seem to be safe to call from multiple goroutines.

I'm not 100% sure, but this looks like a bug in the standard library to me. Rows.Next() seems to think it can get away with only holding Rows.closemu, and Tx.rollback() thinks holding a lock on driverConn prevents concurrent access. Seems like someone has some explaining to do.

@blinsay
Copy link
Author

blinsay commented Aug 29, 2017

Seems like someone has some explaining to do.

@johto the docs don't seem to specify anything about whether or not sql.Tx and driver.Tx need to be safe for concurrent use. it's unclear to me whether the default should be safe or unsafe - it seems like the Go authors and driver authors have been assuming different things.

I have an open stdlib issue to see if we can get some clarification:

golang/go#21117

@chrispassas
Copy link

I know its not likely to change but has anyone tried this with Go 1.9?

If not I'll try to reproduce later this week or next.

@blinsay
Copy link
Author

blinsay commented Aug 30, 2017

I just reproduced it with the same repro on Go 1.9.

EDIT: panic trace is below. here's some sample race detector output.

panic: runtime error: slice bounds out of range [recovered]
	panic: runtime error: slice bounds out of range

goroutine 556 [running]:
github.com/lib/pq.(*conn).errRecover(0xc420d15b80, 0xc420dbdcd0)
	$GOPATH/src/github.com/lib/pq/error.go:482 +0x771
panic(0x12ebd80, 0x1475310)
	/usr/local/go/src/runtime/panic.go:491 +0x2a2
github.com/lib/pq.(*readBuf).next(...)
	$GOPATH/src/github.com/lib/pq/buf.go:42
github.com/lib/pq.(*readBuf).byte(...)
	$GOPATH/src/github.com/lib/pq/buf.go:48
github.com/lib/pq.(*conn).processReadyForQuery(...)
	$GOPATH/src/github.com/lib/pq/conn.go:1546
github.com/lib/pq.(*rows).Next(0xc420092d20, 0xc420ccaa50, 0x1, 0x1, 0x0, 0x0)
	$GOPATH/src/github.com/lib/pq/conn.go:1393 +0xf60
database/sql.(*Rows).nextLocked(0xc420578f80, 0xc420dbdd70)
	/usr/local/go/src/database/sql/sql.go:2457 +0xd2
database/sql.(*Rows).Next.func1()
	/usr/local/go/src/database/sql/sql.go:2442 +0x58
database/sql.withLock(0x145fbe0, 0xc420578fb0, 0xc420dbddb8)
	/usr/local/go/src/database/sql/sql.go:2855 +0x77
database/sql.(*Rows).Next(0xc420578f80, 0x1336588)
	/usr/local/go/src/database/sql/sql.go:2441 +0xe9
database/sql.(*Row).Scan(0xc4205297e0, 0xc420dbdf10, 0x1, 0x1, 0x0, 0x0)
	/usr/local/go/src/database/sql/sql.go:2796 +0x1b1
main.main.func1(0xc4204a0f90, 0xc420092140, 0xc42009a080, 0x35, 0x3, 0x5f2fad0)
	$GOPATH/src/github.com/blinsay/sandbox/db-repro/pg/main.go:62 +0x477
created by main.main
	$GOPATH/src/github.com/blinsay/sandbox/db-repro/pg/main.go:49 +0x606

@chrispassas
Copy link

chrispassas commented Aug 31, 2017

Trying to summarize this issue. Is it fair to say the Go standard library documentation isn't clear for driver writers? But it may be possible for driver writers to fix this issue or the Go team or both?

I'm hoping this doesn't get stuck in a situation where the pq team and the Go team expect the other to address the issue and nothing is done.

I use context with all database actions so I don't get into a situation where I may get blocked indefinitely.

@johto
Copy link
Contributor

johto commented Sep 1, 2017

it's unclear to me whether the default should be safe or unsafe

It's pretty clear to me that if the documentation doesn't mandate concurrency safety from the drivers, the sql package should make no assumptions.

it seems like the Go authors and driver authors have been assuming different things.

I stopped contributing before contexts were added, but back when I was still active we certainly assumed we should never use the driver interfaces concurrently. This could very well be unintentional.

@blinsay
Copy link
Author

blinsay commented Sep 25, 2017

FYI this seems like it was indeed a bug with the stdlib, @johto was right about the assumption that the driver interface should never get used concurrently. There's a fix in review right now!

With the patch applied, it seems like the only thing left for lib/pq to handle is that the driver frequently seems to be returning sql.ErrNoRows instead of a context cancellation error or a Postgres query cancellation error.

@iwanbk
Copy link

iwanbk commented Sep 9, 2019

hi, does this issue still valid?
I can reproduce the issue with https://gist.github.com/blinsay/044f39176eeab7f3020d1b527df5bdd7 with Go 1.9.1 in few minutes

But I can't reproduce it with Go 1.10.4 and it already almost 1 hour.

The command

go run -race repro.go -reps 5 -timeout-ms 100 -jitter-ms 5

@maddyblue
Copy link
Collaborator

Well then I guess we can close this?

@blinsay
Copy link
Author

blinsay commented Sep 13, 2019

Thanks for closing! I haven't seen it since reporting the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants