Skip to content

Wrong buffer format returned for similar numpy arrays passed to pybind11 function #1806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JanuszL opened this issue Jun 13, 2019 · 3 comments

Comments

@JanuszL
Copy link

JanuszL commented Jun 13, 2019

Issue description

Function accepting pybind11::buffer reports different underlying type when provided with (at least at the first glance) numpy arrays. I can only guess it may be more related to numpy which somehow fails to comply with python's buffer protocol in some cases. But I want to make sure that pybind11 doesn't do anything that could affect this.

Reproducible example code

a = np.array([1, 2], dtype=np.longlong)
b = np.array([1, 2], dtype=np.int64)
some_native_function_accepting_py11buffer(a)
some_native_function_accepting_py11buffer(b)

a reports to have q format while b l which is wrong in b case. Numpy claims dtype as int64 for both of them.
Full repro attached, just issue run.sh (and make sure that pybind11 is present in the same dir as run.sh).

@eacousineau
Copy link
Contributor

Hm... Mayhaps it's due to an ordering mismatch in pybind11s NumPy type flags?
I had complained about it in this issue:
#1328

But then, uh, let my PR stall:
#1329

Lemme see about dusting that off, just in case it covers your problem at all.

@JanuszL
Copy link
Author

JanuszL commented Jun 19, 2019

It seems that it is not that simple. Basically, py_buffer format can have two characters (https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment):

  • modifier telling if to use native or standard size (there are some about byte ordering but I'm not caring about them that much)
  • actual format
    And it can happen that there is more than one valid answer (mapping), np.longlong and np.int64 are both 8 bytes long and can be encoded using l or q (assuming default @ modifier telling to use the native size).
    The problem I had was that I have used format_descriptor function instantiated for a given type and compared returned result with the actual type in the tested py_buffer. Due to the above reason, it worked fine for q format while it didn't for l as the implementation is not that flexible https://github.com/pybind/pybind11/blob/master/include/pybind11/detail/common.h#L700 and returns only on character for given data size and sign.
    I think this logic should be extended. I followed my own way - Rework how DALI handles py_buffer format string NVIDIA/DALI#985, which is not that beautiful but works in all cases I have tested.

@YannickJadoul
Copy link
Collaborator

Related to the discussion on #1908: i, l, and q are related to the dtypes np.cint, np.long, and np.longlong, and int32 and int64 are only aliases in numpy.

If there's something where pybind11 mismatches pure numpy, please do reopen!

havogt pushed a commit to GridTools/gridtools that referenced this issue Apr 12, 2021
Python integer format char is ambiguous and platform dependent. PyBind11 `format_descriptor<...>::format()` always returns "q" and "Q" for 64bit integers, independent of the platform. Compatible passed-in Python buffers on the other hand might also have the equivalent format "l" or "L" set. See pybind/pybind11#1806 and pybind/pybind11#1908 for details. This fix introduces a special case for integer format comparisons, just checking size and signedness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants