Skip to content

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
orichardson opened this issue Apr 12, 2020 · 7 comments · Fixed by #16080
Closed

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

orichardson opened this issue Apr 12, 2020 · 7 comments · Fixed by #16080

Comments

@orichardson
Copy link

orichardson commented Apr 12, 2020

The index arrays for einsum do not accept the output of any numpy array, because these datatypes cannot be ints (see #2951; #12322). The test should be modified to accept numpy ints, because converting the datatype to 'int' explicitly is counter-intuitive and unnecessary.

Though easy to circumvent, this will incur a lot of unnecessary debugging time, as it appears to break abstraction boundaries, has strange interactions with tolist, and indices that work in test environments fail when constructed programatically.

Reproducing code example:

import numpy as np
X = np.arange(9).reshape(3,3)

# each of these results in an identical list [0] as far as equality test is concerned 
idx1 = [0]                # list[ int ] 
idx2 = np.unique(idx1)    # np.array [int64]
idx3 = idx2.tolist()      # list [int]
idx3 = list(idx2)         # list [int64]

np.einsum(X, [0], idx1 ) # succeeds
np.einsum(X, [0], idx2 ) # fails
np.einsum(X, [0], idx3 ) # succeeds
np.einsum(X, [0], idx4 ) # fails

Error message:

ValueError: each subscript must be either an integer or an ellipsis

Full error message if optimize_arg is False:

<__array_function__ internals> in einsum(*args, **kwargs)

~/.local/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs)
   1354     # If no optimization, run pure einsum
   1355     if optimize_arg is False:
-> 1356         return c_einsum(*operands, **kwargs)
   1357 
   1358     valid_einsum_kwargs = ['out', 'dtype', 'order', 'casting']

ValueError: each subscript must be either an integer or an ellipsis

Full error message when optimize_arg is True:

<__array_function__ internals> in einsum(*args, **kwargs)

~/.local/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs)
   1377     # Build the contraction list and operand
   1378     operands, contraction_list = einsum_path(*operands, optimize=optimize_arg,
-> 1379                                              einsum_call=True)
   1380 
   1381     handle_out = False

Numpy/Python version information:

1.17.3 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]

@eric-wieser
Copy link
Member

We ought to be calling operator.index or the equivalent C API here, rather than the PyInt_Check I assume we must be doing today.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 13, 2020

Relevant lines are here:

/* Subscript */
else if (PyInt_Check(item) || PyLong_Check(item)) {
long s = PyInt_AsLong(item);
npy_bool bad_input = 0;
if (subindex + 1 >= subsize) {
PyErr_SetString(PyExc_ValueError,
"subscripts list is too long");
Py_DECREF(obj);
return -1;
}
if ( s < 0 ) {
bad_input = 1;
}
else if (s < 26) {
subscripts[subindex++] = 'A' + (char)s;
}
else if (s < 2*26) {
subscripts[subindex++] = 'a' + (char)s - 26;
}
else {
bad_input = 1;
}
if (bad_input) {
PyErr_SetString(PyExc_ValueError,
"subscript is not within the valid range [0, 52)");
Py_DECREF(obj);
return -1;
}
}
/* Invalid */
else {
PyErr_SetString(PyExc_ValueError,
"each subscript must be either an integer "
"or an ellipsis");
Py_DECREF(obj);
return -1;
}

Logic ought to be the C equivalent of:

try:
    s = operator.index(item)
except TypeError as e:
    raise TypeError("each subscript must be either an integer "
                    "or an ellipsis") from e
# do stuff with s

@guilhermeleobas
Copy link
Contributor

I will take this one.

@tinaoberoi
Copy link
Contributor

@guilhermeleobas you working on this ? If not I would like to take this up. Thanks

@guilhermeleobas
Copy link
Contributor

Hi @tinaoberoi, it was on my plans but you can go ahead and work on it :)

@tinaoberoi
Copy link
Contributor

Thanks @guilhermeleobas , I new here, any previous findings and help is appreciated. :-)

@rlintott
Copy link
Contributor

rlintott commented Apr 25, 2020

I'm also a first time contributor. I might try fixing this. I don't really mind if someone else contribute first. In any case, the experience is good for me.

rlintott added a commit to rlintott/numpy that referenced this issue Apr 26, 2020
rlintott added a commit to rlintott/numpy that referenced this issue Apr 26, 2020
seberg pushed a commit that referenced this issue May 16, 2020
* Using PyArray_PyIntAsIntp helper function instead

* TST: add tests for einsum numpy int and bool list subscripts
Added tests to check that einsum accepts numpy int64 types and
rejects bool. Rejecting bools is new behaviour in subscript lists.
I changed ValueError to TypeError on line 2496 in multiarraymodule.c
as it is more appropriate. I also modified einsumfunc.py to have the
same behaviour as in the C file when checking subscript list.
(Reject bools but accept anything else from operator.index())

Closes gh-15961
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants