-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-44032: Defer clearing last thread state until last GC has been run. #26285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-44032: Defer clearing last thread state until last GC has been run. #26285
Conversation
@erlend-aasland Can you confirm that this fixes the issue? |
Python/pystate.c
Outdated
@@ -324,6 +326,7 @@ interpreter_clear(PyInterpreterState *interp, PyThreadState *tstate) | |||
|
|||
/* Last garbage collection on this interpreter */ | |||
_PyGC_CollectNoFail(tstate); | |||
PyThreadState_Clear(tstate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For safety, I would prefer to ensure that the tstate belongs to the interpreter if PyInterpreterState_Clear() is called from a different interpreter (even if I don't know if it's currently possible):
PyThreadState_Clear(tstate); | |
// Don't clear tstate if it belongs to another interpreter | |
if (tstate->interp == interp) { | |
PyThreadState_Clear(tstate); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thread state should remain more or less usable after PyInterpreterState_Clear() is called. I don't think that this PR fix the root issue of https://bugs.python.org/issue44032#msg394104
IMO it would be safer to only release datastack_chunk memory in tstate_delete_common()
, rather than in PyThreadState_Clear()
.
This change remains interesting ;-)
I'll do as soon as I get back on my computer. UPDATE: This PR does not fix the issue. I've used @pablogsal's reproducer from #26274 (comment). Here's an excerpt of the dump (complete dump attached):
|
@vstinner I don't see how a thread state can remain usable after the interpreter has been cleared. |
FYI, freeing datastack chunks in |
@@ -900,13 +897,6 @@ PyThreadState_Clear(PyThreadState *tstate) | |||
if (tstate->on_delete != NULL) { | |||
tstate->on_delete(tstate->on_delete_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be moved to tstate_delete_common
, or is it unrelated to deleting the data stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what it does, so I'm not touching it 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me neither :) I tried moving it, and it seems to work fine either place. BTW, I found a clue in Include:
cpython/Include/cpython/pystate.h
Lines 134 to 143 in a9e4361
* So instead of holding the lock directly, the tstate holds a weakref to | |
* the lock: that's the value of on_delete_data below. Decref'ing a | |
* weakref is harmless. | |
* on_delete points to _threadmodule.c's static release_sentinel() function. | |
* After the tstate is unlinked, release_sentinel is called with the | |
* weakref-to-lock (on_delete_data) argument, and release_sentinel releases | |
* the indirectly held lock. | |
*/ | |
void (*on_delete)(void *); | |
void *on_delete_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: https://docs.python.org/3/c-api/init.html#c.PyThreadState_Clear:
"Changed in version 3.9: This function now calls the PyThreadState.on_delete callback. Previously, that happened in PyThreadState_Delete()."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and here:
cpython/Modules/_threadmodule.c
Lines 1280 to 1329 in a9e4361
release_sentinel(void *wr_raw) | |
{ | |
PyObject *wr = _PyObject_CAST(wr_raw); | |
/* Tricky: this function is called when the current thread state | |
is being deleted. Therefore, only simple C code can safely | |
execute here. */ | |
PyObject *obj = PyWeakref_GET_OBJECT(wr); | |
lockobject *lock; | |
if (obj != Py_None) { | |
lock = (lockobject *) obj; | |
if (lock->locked) { | |
PyThread_release_lock(lock->lock_lock); | |
lock->locked = 0; | |
} | |
} | |
/* Deallocating a weakref with a NULL callback only calls | |
PyObject_GC_Del(), which can't call any Python code. */ | |
Py_DECREF(wr); | |
} | |
static PyObject * | |
thread__set_sentinel(PyObject *module, PyObject *Py_UNUSED(ignored)) | |
{ | |
PyObject *wr; | |
PyThreadState *tstate = PyThreadState_Get(); | |
lockobject *lock; | |
if (tstate->on_delete_data != NULL) { | |
/* We must support the re-creation of the lock from a | |
fork()ed child. */ | |
assert(tstate->on_delete == &release_sentinel); | |
wr = (PyObject *) tstate->on_delete_data; | |
tstate->on_delete = NULL; | |
tstate->on_delete_data = NULL; | |
Py_DECREF(wr); | |
} | |
lock = newlockobject(module); | |
if (lock == NULL) | |
return NULL; | |
/* The lock is owned by whoever called _set_sentinel(), but the weakref | |
hangs to the thread state. */ | |
wr = PyWeakref_NewRef((PyObject *) lock, NULL); | |
if (wr == NULL) { | |
Py_DECREF(lock); | |
return NULL; | |
} | |
tstate->on_delete_data = (void *) wr; | |
tstate->on_delete = &release_sentinel; | |
return (PyObject *) lock; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖 New build scheduled with the buildbot fleet by @markshannon for commit ddad17e 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Calling _PyObject_VirtualFree() in tstate_delete_common() sounds like the good thing to do. I doesn't mean that a tstate must remain usable after calling PyThreadState_Clear(state), it's just about consistency.
test_asyncio hangs, unrelated error: https://bugs.python.org/issue44112 |
I'm not worried about asyncio, but test_multiprocessing also fails for one and test_posix for another. There doesn't seem to a coherent model of how threads and interpreters are related, or what memory belongs to the interpreter or thread. |
Sounds like an issue should be opened for this, if there's not already one on bpo. |
AMD64 RHEL8 Refleaks PR:
I can be a random error which doesn't fail with re-run in verbose mode, we don't know since the job hangs before test_posix can be re-run. |
I fail to see the test_multiprocessing error. multiprocessing tests are full of race conditions. Some are legit bugs, some are bugs in the tests. I fixed tons of bugs in multiprocessing and in its test suite last years, but if you search for "multiprocessing" in the bug tracker, you will see a long list of issues open for several weeks/months. IMO it's perfectly safe to merge this PR. All issues that you saw are very likely known (not regression of your change). Clearing the current thread state would be another can of worm, but your PR no longer does that ;-) |
Ok, I'll merge this then. |
Yeah, it's fine, in the really worst case, we can just revert the change to revisit the options. But again, IMO this function is perfectly safe. |
https://bugs.python.org/issue44032