-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: change PYBIND11_EMBEDDED_MODULE to multiphase init #5665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
So that they too can support multi-interpreter and nogil tags
7c358c7
to
64f7a0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the PYBIND11_EMBEDDED_MODULE macro to perform multiphase module initialization, enabling support for free threading and multiple interpreters, and updates associated documentation and CI testing.
- Update of the PYBIND11_EMBEDDED_MODULE macro to accept multiphase init flags
- Test adjustments to validate per-interpreter GIL behavior
- Documentation updates to describe the new initialization options and CI configuration changes
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/test_embed/test_interpreter.cpp | Adjusts tests to check behavior of modules with and without per_interpreter_gil |
include/pybind11/embed.h | Updates macro implementation to support multiphase init and new return types |
docs/advanced/misc.rst | Adds notes and anchor references for free-threading and sub-interpreter support |
docs/advanced/embedding.rst | Updates documentation to list accepted multiphase init flags |
.github/workflows/ci.yml | Updates CI matrix to include new Python versions with 't' suffixes |
Comments suppressed due to low confidence (2)
tests/test_embed/test_interpreter.cpp:519
- [nitpick] The change from 'widget_module' to 'trampoline_module' in the failing import test may be confusing. Please ensure that 'trampoline_module' is properly defined and its naming clearly distinguishes it from 'widget_module' in this context.
py::module_::import("trampoline_module");
include/pybind11/embed.h:70
- The multiphase module initialization now returns an int instead of a module pointer. Please confirm that all consumers of this macro handle the updated return type consistent with PEP 489 conventions.
return 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is passing on 3.13t and 3.14t!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with pybind11 or C++, but this makes sense to me. You've set up the module def with the slots and added the exec func, which is correct.
The open jobs are taking 3+ hours (timeout is the default 6 hours). Something here is causing it to hang when it gets to the embedding test. I think all the hanging ones are <3.12. That's also what happened to Appveyor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by comment. (I need to look more closely later.)
I was able to reproduce locally using miniforge3, roughly:
This hangs. Then for debugging I switched to my scons-based build, which I can more easily navigate. It also hangs. I ran some observations by ChatGPT, look from the bottom up here (i.e. ignore the random stuff at the top): https://chatgpt.com/share/6828e3a2-bcac-8008-889a-88f68c3eeba4 |
…RT was off So the fix is really this test should not be run in these older versions at all. The hang was a GIL issue between the subinterpreters during pybind11::exception::what().
50a5fa2
to
f0e6ac2
Compare
I thought that might be the problem, it was very suspicious that it was passing exactly on the version that support subinterpreters. |
Not sure what to do about the embedded "threading" test. I thought the critical section would fix it, but it apparently didn't. It's more of a GIL test, we could just ifdef it out entirely.... (The failure is not related to sub-interpreters or multiphase init, it's a normal free-threading problem, there's no lock on the dict it uses....) |
Do we know why it works on 3.14t but not 3.13t? |
Signed-off-by: Henry Schreiner <[email protected]>
6719aba
to
8c99bfb
Compare
@colesbury Do you know why I worked around it by using a Tested locally with: rm -rf build .venv
cmake --preset venv -DPYBIND11_CREATE_WITH_UV=3.13t
cmake --build --preset venv --target cpptest
rm -rf build .venv
cmake --preset venv -DPYBIND11_CREATE_WITH_UV:PATH=~/.pyenv/versions/3.14.0b1t/bin/python
cmake --build --preset venv --target cpptest (#5668 swaps the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a lot of highly non-trivial code duplication here:
--- /home/rgrossekunst/zmodule 2025-05-18 08:47:03.816569348 -0700
+++ /home/rgrossekunst/zembed 2025-05-18 08:47:31.377488434 -0700
@@ -1,36 +1,37 @@
PYBIND11_WARNING_PUSH
PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments")
-#define PYBIND11_MODULE(name, variable, ...) \
+#define PYBIND11_EMBEDDED_MODULE(name, variable, ...) \
static ::pybind11::module_::module_def PYBIND11_CONCAT(pybind11_module_def_, name); \
static ::pybind11::module_::slots_array PYBIND11_CONCAT(pybind11_module_slots_, name); \
static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *); \
static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &); \
- PYBIND11_PLUGIN_IMPL(name) { \
- PYBIND11_CHECK_PYTHON_VERSION \
- PYBIND11_ENSURE_INTERNALS_READY \
+ static PyObject PYBIND11_CONCAT(*pybind11_init_wrapper_, name)() { \
static auto result = []() { \
auto &slots = PYBIND11_CONCAT(pybind11_module_slots_, name); \
slots[0] = {Py_mod_exec, \
reinterpret_cast<void *>(&PYBIND11_CONCAT(pybind11_exec_, name))}; \
slots[1] = {0, nullptr}; \
return ::pybind11::module_::initialize_multiphase_module_def( \
PYBIND11_TOSTRING(name), \
nullptr, \
&PYBIND11_CONCAT(pybind11_module_def_, name), \
slots, \
##__VA_ARGS__); \
}(); \
return result.ptr(); \
} \
+ PYBIND11_EMBEDDED_MODULE_IMPL(name) \
+ ::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)( \
+ PYBIND11_TOSTRING(name), PYBIND11_CONCAT(pybind11_init_impl_, name)); \
int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) { \
- pybind11::detail::get_num_interpreters_seen() += 1; \
try { \
auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm); \
PYBIND11_CONCAT(pybind11_init_, name)(m); \
return 0; \
} \
PYBIND11_CATCH_INIT_EXCEPTIONS \
return -1; \
} \
- void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ & (variable))
+ void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ \
+ & variable) // NOLINT(bugprone-macro-parentheses)
PYBIND11_WARNING_POP
@b-pass did you consider making this more DRY?
If that's not feasible, at a minimum we should leave "ATTENTION-if-you-make-changes-here-review-the-duplicated-code-in-the-other-place" kind of comments, and ideally move the duplicated code to a common place.
#ifdef Py_GIL_DISABLED | ||
# if PY_VERSION_HEX < 0x030E0000 | ||
std::lock_guard<std::mutex> lock(mutex); | ||
locals["count"] = locals["count"].cast<int>() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-locking here (GIL in line 637, C++ mutex in line 640) has the potential to deadlock, because the code in line 641 calls into the Python C API, which may release (and reacquire) the GIL (see docs/advanced/deadlock.md). — I can believe that this will be extremely rare, but we should leave a comment here to explain that we're aware of the risk, and that we're knowingly accepting it.
See also: https://chatgpt.com/share/682a07ec-dd70-8008-92ea-e722ff92a055
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can deadlock. These specific concrete types and calls will not release/reacquire the GIL (PyDict_Get/SetItem with string keys and long object values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These specific concrete types
I believe that, but examples and tests are often copy-pasted, and then something else is done here.
The comment I'm suggesting is to make people aware of the pitfall. This is not a safe pattern in general.
@henryiii - I think what's happening is that the modification to
calls This doesn't happen in 3.14t because we have an optimization for this case (python/cpython#128126). It was intended as a performance optimization, but it also avoids the unlocking/relock behavior and so preserves atomicity. |
I didn't really, but you're right. I'll make a new PR for consolidating some of this boilerplate.... |
Description
Changes
PYBIND11_EMBEDDED_MODULE
to do multiphase init. This is required for embedded modules to properly express their support (or not) for free threading and multiple interpreters. Also added a note to the docs for this.I believe this will also fix the free-threading multiple interpreter test failed @henryiii had on #5646.
Suggested changelog entry: