Skip to content

feat: change PYBIND11_EMBEDDED_MODULE to multiphase init #5665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 18, 2025

Conversation

b-pass
Copy link
Contributor

@b-pass b-pass commented May 17, 2025

Description

Changes PYBIND11_EMBEDDED_MODULE to do multiphase init. This is required for embedded modules to properly express their support (or not) for free threading and multiple interpreters. Also added a note to the docs for this.

I believe this will also fix the free-threading multiple interpreter test failed @henryiii had on #5646.

Suggested changelog entry:

* Changed ``PYBIND11_EMBEDDED_MODULE`` macro implementation to perform multi-phase module initialization (PEP 489) behind the scenes and to support ``py::mod_gil_not_used()``, ``py::multiple_interpreters::per_interpreter_gil()`` and ``py::multiple_interpreters::shared_gil()``.

b-pass added a commit to b-pass/pybind11 that referenced this pull request May 17, 2025
b-pass added a commit to b-pass/pybind11 that referenced this pull request May 17, 2025
@henryiii henryiii force-pushed the embedded-multiphase branch from 7c358c7 to 64f7a0a Compare May 17, 2025 02:00
@henryiii henryiii self-requested a review as a code owner May 17, 2025 02:00
@henryiii henryiii requested review from rwgk and Copilot May 17, 2025 04:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the PYBIND11_EMBEDDED_MODULE macro to perform multiphase module initialization, enabling support for free threading and multiple interpreters, and updates associated documentation and CI testing.

  • Update of the PYBIND11_EMBEDDED_MODULE macro to accept multiphase init flags
  • Test adjustments to validate per-interpreter GIL behavior
  • Documentation updates to describe the new initialization options and CI configuration changes

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_embed/test_interpreter.cpp Adjusts tests to check behavior of modules with and without per_interpreter_gil
include/pybind11/embed.h Updates macro implementation to support multiphase init and new return types
docs/advanced/misc.rst Adds notes and anchor references for free-threading and sub-interpreter support
docs/advanced/embedding.rst Updates documentation to list accepted multiphase init flags
.github/workflows/ci.yml Updates CI matrix to include new Python versions with 't' suffixes
Comments suppressed due to low confidence (2)

tests/test_embed/test_interpreter.cpp:519

  • [nitpick] The change from 'widget_module' to 'trampoline_module' in the failing import test may be confusing. Please ensure that 'trampoline_module' is properly defined and its naming clearly distinguishes it from 'widget_module' in this context.
py::module_::import("trampoline_module");

include/pybind11/embed.h:70

  • The multiphase module initialization now returns an int instead of a module pointer. Please confirm that all consumers of this macro handle the updated return type consistent with PEP 489 conventions.
return 0;

Copy link
Collaborator

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is passing on 3.13t and 3.14t!

Copy link

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with pybind11 or C++, but this makes sense to me. You've set up the module def with the slots and added the exec func, which is correct.

@henryiii
Copy link
Collaborator

henryiii commented May 17, 2025

The open jobs are taking 3+ hours (timeout is the default 6 hours). Something here is causing it to hang when it gets to the embedding test. I think all the hanging ones are <3.12. That's also what happened to Appveyor.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive by comment. (I need to look more closely later.)

@rwgk
Copy link
Collaborator

rwgk commented May 17, 2025

The open jobs are taking 3+ hours (timeout is the default 6 hours). Something here is causing it to hang when it gets to the embedding test. I think all the hanging ones are <3.12. That's also what happened to Appveyor.

I was able to reproduce locally using miniforge3, roughly:

conda create --yes -n py310 python=3.10
conda activate py310
cmake -S $HOME/forked/pybind11 -B cmake_build_conda_py310 -DCMAKE_VERBOSE_MAKEFILE=ON -DDOWNLOAD_CATCH=ON -DPYTHON_EXECUTABLE=$(which python3) -DCMAKE_CXX_STANDARD=20 -DCMAKE_BUILD_TYPE=Debug -DPYBIND11_PYTEST_ARGS=-vv && cd cmake_build_conda_py310 && make check -j 64

This hangs. Then for debugging I switched to my scons-based build, which I can more easily navigate. It also hangs. I ran some observations by ChatGPT, look from the bottom up here (i.e. ignore the random stuff at the top):

https://chatgpt.com/share/6828e3a2-bcac-8008-889a-88f68c3eeba4

…RT was off

So the fix is really this test should not be run in these older versions at all.

The hang was a GIL issue between the subinterpreters during pybind11::exception::what().
@b-pass b-pass force-pushed the embedded-multiphase branch from 50a5fa2 to f0e6ac2 Compare May 18, 2025 02:29
@henryiii
Copy link
Collaborator

I thought that might be the problem, it was very suspicious that it was passing exactly on the version that support subinterpreters.

@b-pass
Copy link
Contributor Author

b-pass commented May 18, 2025

Not sure what to do about the embedded "threading" test. I thought the critical section would fix it, but it apparently didn't. It's more of a GIL test, we could just ifdef it out entirely.... (The failure is not related to sub-interpreters or multiphase init, it's a normal free-threading problem, there's no lock on the dict it uses....)

@henryiii
Copy link
Collaborator

Do we know why it works on 3.14t but not 3.13t?

Signed-off-by: Henry Schreiner <[email protected]>
@henryiii henryiii force-pushed the embedded-multiphase branch from 6719aba to 8c99bfb Compare May 18, 2025 05:31
@henryiii
Copy link
Collaborator

henryiii commented May 18, 2025

@colesbury Do you know why Py_CRITICAL_SECTION(dict) works in 3.14t but not in 3.13t? Is that expected?

I worked around it by using a std::mutex for 3.13t.

Tested locally with:

rm -rf build .venv
cmake --preset venv -DPYBIND11_CREATE_WITH_UV=3.13t
cmake --build --preset venv --target cpptest

rm -rf build .venv
cmake --preset venv -DPYBIND11_CREATE_WITH_UV:PATH=~/.pyenv/versions/3.14.0b1t/bin/python
cmake --build --preset venv --target cpptest

(#5668 swaps the rm for support for the built-in cmake --fresh)

@henryiii henryiii changed the title Change PYBIND11_EMBEDDED_MODULE to multiphase init feat: change PYBIND11_EMBEDDED_MODULE to multiphase init May 18, 2025
@henryiii henryiii merged commit 9afc9c4 into pybind:master May 18, 2025
73 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label May 18, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of highly non-trivial code duplication here:

--- /home/rgrossekunst/zmodule  2025-05-18 08:47:03.816569348 -0700
+++ /home/rgrossekunst/zembed   2025-05-18 08:47:31.377488434 -0700
@@ -1,36 +1,37 @@
 PYBIND11_WARNING_PUSH
 PYBIND11_WARNING_DISABLE_CLANG("-Wgnu-zero-variadic-macro-arguments")
-#define PYBIND11_MODULE(name, variable, ...)                                                      \
+#define PYBIND11_EMBEDDED_MODULE(name, variable, ...)                                             \
     static ::pybind11::module_::module_def PYBIND11_CONCAT(pybind11_module_def_, name);           \
     static ::pybind11::module_::slots_array PYBIND11_CONCAT(pybind11_module_slots_, name);        \
     static int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject *);                                 \
     static void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ &);                     \
-    PYBIND11_PLUGIN_IMPL(name) {                                                                  \
-        PYBIND11_CHECK_PYTHON_VERSION                                                             \
-        PYBIND11_ENSURE_INTERNALS_READY                                                           \
+    static PyObject PYBIND11_CONCAT(*pybind11_init_wrapper_, name)() {                            \
         static auto result = []() {                                                               \
             auto &slots = PYBIND11_CONCAT(pybind11_module_slots_, name);                          \
             slots[0] = {Py_mod_exec,                                                              \
                         reinterpret_cast<void *>(&PYBIND11_CONCAT(pybind11_exec_, name))};        \
             slots[1] = {0, nullptr};                                                              \
             return ::pybind11::module_::initialize_multiphase_module_def(                         \
                 PYBIND11_TOSTRING(name),                                                          \
                 nullptr,                                                                          \
                 &PYBIND11_CONCAT(pybind11_module_def_, name),                                     \
                 slots,                                                                            \
                 ##__VA_ARGS__);                                                                   \
         }();                                                                                      \
         return result.ptr();                                                                      \
     }                                                                                             \
+    PYBIND11_EMBEDDED_MODULE_IMPL(name)                                                           \
+    ::pybind11::detail::embedded_module PYBIND11_CONCAT(pybind11_module_, name)(                  \
+        PYBIND11_TOSTRING(name), PYBIND11_CONCAT(pybind11_init_impl_, name));                     \
     int PYBIND11_CONCAT(pybind11_exec_, name)(PyObject * pm) {                                    \
-        pybind11::detail::get_num_interpreters_seen() += 1;                                       \
         try {                                                                                     \
             auto m = pybind11::reinterpret_borrow<::pybind11::module_>(pm);                       \
             PYBIND11_CONCAT(pybind11_init_, name)(m);                                             \
             return 0;                                                                             \
         }                                                                                         \
         PYBIND11_CATCH_INIT_EXCEPTIONS                                                            \
         return -1;                                                                                \
     }                                                                                             \
-    void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_ & (variable))
+    void PYBIND11_CONCAT(pybind11_init_, name)(::pybind11::module_                                \
+                                               & variable) // NOLINT(bugprone-macro-parentheses)
 PYBIND11_WARNING_POP

@b-pass did you consider making this more DRY?

If that's not feasible, at a minimum we should leave "ATTENTION-if-you-make-changes-here-review-the-duplicated-code-in-the-other-place" kind of comments, and ideally move the duplicated code to a common place.

#ifdef Py_GIL_DISABLED
# if PY_VERSION_HEX < 0x030E0000
std::lock_guard<std::mutex> lock(mutex);
locals["count"] = locals["count"].cast<int>() + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double-locking here (GIL in line 637, C++ mutex in line 640) has the potential to deadlock, because the code in line 641 calls into the Python C API, which may release (and reacquire) the GIL (see docs/advanced/deadlock.md). — I can believe that this will be extremely rare, but we should leave a comment here to explain that we're aware of the risk, and that we're knowingly accepting it.

See also: https://chatgpt.com/share/682a07ec-dd70-8008-92ea-e722ff92a055

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can deadlock. These specific concrete types and calls will not release/reacquire the GIL (PyDict_Get/SetItem with string keys and long object values).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These specific concrete types

I believe that, but examples and tests are often copy-pasted, and then something else is done here.

The comment I'm suggesting is to make people aware of the pitfall. This is not a safe pattern in general.

@colesbury
Copy link
Contributor

@henryiii - I think what's happening is that the modification to locals tries to lock the dictionary. For example:

locals["count"] = ...

calls PyDict_SetItem, which tries to lock the already-locked dictionary. This doesn't deadlock, but it does break the atomicity (the thread temporarily releases the critical section lock from Py_BEGIN_CRITICAL_SECTION(locals.ptr()) and then re-locks the dictionary. In between the unlock and lock, another thread can run.

This doesn't happen in 3.14t because we have an optimization for this case (python/cpython#128126). It was intended as a performance optimization, but it also avoids the unlocking/relock behavior and so preserves atomicity.

@b-pass
Copy link
Contributor Author

b-pass commented May 18, 2025

We have a lot of highly non-trivial code duplication here:

@b-pass did you consider making this more DRY?

I didn't really, but you're right. I'll make a new PR for consolidating some of this boilerplate....

b-pass added a commit to b-pass/pybind11 that referenced this pull request May 18, 2025
henryiii pushed a commit to b-pass/pybind11 that referenced this pull request May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants