Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

orichardson · 2020-04-12T17:38:23Z

The index arrays for einsum do not accept the output of any numpy array, because these datatypes cannot be ints (see #2951; #12322). The test should be modified to accept numpy ints, because converting the datatype to 'int' explicitly is counter-intuitive and unnecessary.

Though easy to circumvent, this will incur a lot of unnecessary debugging time, as it appears to break abstraction boundaries, has strange interactions with tolist, and indices that work in test environments fail when constructed programatically.

Reproducing code example:

import numpy as np
X = np.arange(9).reshape(3,3)

# each of these results in an identical list [0] as far as equality test is concerned 
idx1 = [0]                # list[ int ] 
idx2 = np.unique(idx1)    # np.array [int64]
idx3 = idx2.tolist()      # list [int]
idx3 = list(idx2)         # list [int64]

np.einsum(X, [0], idx1 ) # succeeds
np.einsum(X, [0], idx2 ) # fails
np.einsum(X, [0], idx3 ) # succeeds
np.einsum(X, [0], idx4 ) # fails

Error message:

ValueError: each subscript must be either an integer or an ellipsis

Full error message if optimize_arg is False:

<__array_function__ internals> in einsum(*args, **kwargs)

~/.local/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs)
   1354     # If no optimization, run pure einsum
   1355     if optimize_arg is False:
-> 1356         return c_einsum(*operands, **kwargs)
   1357 
   1358     valid_einsum_kwargs = ['out', 'dtype', 'order', 'casting']

ValueError: each subscript must be either an integer or an ellipsis

Full error message when optimize_arg is True:

<__array_function__ internals> in einsum(*args, **kwargs)

~/.local/lib/python3.6/site-packages/numpy/core/einsumfunc.py in einsum(*operands, **kwargs)
   1377     # Build the contraction list and operand
   1378     operands, contraction_list = einsum_path(*operands, optimize=optimize_arg,
-> 1379                                              einsum_call=True)
   1380 
   1381     handle_out = False

Numpy/Python version information:

1.17.3 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]

The text was updated successfully, but these errors were encountered:

eric-wieser · 2020-04-12T17:45:11Z

We ought to be calling operator.index or the equivalent C API here, rather than the PyInt_Check I assume we must be doing today.

eric-wieser · 2020-04-13T14:37:13Z

Relevant lines are here:

numpy/numpy/core/src/multiarray/multiarraymodule.c

Lines 2462 to 2501 in b489287

    
           /* Subscript */ 
        
           else if (PyInt_Check(item) || PyLong_Check(item)) { 
        
               long s = PyInt_AsLong(item); 
        
               npy_bool bad_input = 0; 
        
               if (subindex + 1 >= subsize) { 
        
                   PyErr_SetString(PyExc_ValueError, 
        
                           "subscripts list is too long"); 
        
                   Py_DECREF(obj); 
        
                   return -1; 
        
               } 
        
               if ( s < 0 ) { 
        
                   bad_input = 1; 
        
               } 
        
               else if (s < 26) { 
        
                   subscripts[subindex++] = 'A' + (char)s; 
        
               } 
        
               else if (s < 2*26) { 
        
                   subscripts[subindex++] = 'a' + (char)s - 26; 
        
               } 
        
               else { 
        
                   bad_input = 1; 
        
               } 
        
               if (bad_input) { 
        
                   PyErr_SetString(PyExc_ValueError, 
        
                           "subscript is not within the valid range [0, 52)"); 
        
                   Py_DECREF(obj); 
        
                   return -1; 
        
               } 
        
           } 
        
           /* Invalid */ 
        
           else { 
        
               PyErr_SetString(PyExc_ValueError, 
        
                       "each subscript must be either an integer " 
        
                       "or an ellipsis"); 
        
               Py_DECREF(obj); 
        
               return -1; 
        
           }

Logic ought to be the C equivalent of:

try:
    s = operator.index(item)
except TypeError as e:
    raise TypeError("each subscript must be either an integer "
                    "or an ellipsis") from e
# do stuff with s

guilhermeleobas · 2020-04-14T16:25:08Z

I will take this one.

tinaoberoi · 2020-04-21T01:11:50Z

@guilhermeleobas you working on this ? If not I would like to take this up. Thanks

guilhermeleobas · 2020-04-21T01:19:14Z

Hi @tinaoberoi, it was on my plans but you can go ahead and work on it :)

tinaoberoi · 2020-04-22T21:08:50Z

Thanks @guilhermeleobas , I new here, any previous findings and help is appreciated. :-)

rlintott · 2020-04-25T23:09:14Z

I'm also a first time contributor. I might try fixing this. I don't really mind if someone else contribute first. In any case, the experience is good for me.

See numpy#15961.

See numpy#15961

* Using PyArray_PyIntAsIntp helper function instead * TST: add tests for einsum numpy int and bool list subscripts Added tests to check that einsum accepts numpy int64 types and rejects bool. Rejecting bools is new behaviour in subscript lists. I changed ValueError to TypeError on line 2496 in multiarraymodule.c as it is more appropriate. I also modified einsumfunc.py to have the same behaviour as in the C file when checking subscript list. (Reject bools but accept anything else from operator.index()) Closes gh-15961

eric-wieser added 00 - Bug component: numpy.einsum labels Apr 12, 2020

seberg added the good first issue label Apr 13, 2020

rlintott added a commit to rlintott/numpy that referenced this issue Apr 26, 2020

BUG: numpy.einsum indexing arrays now accept numpy int type

3fd7b70

See numpy#15961.

rlintott mentioned this issue Apr 26, 2020

BUG: numpy.einsum indexing arrays now accept numpy int type #16080

Merged

rlintott added a commit to rlintott/numpy that referenced this issue Apr 26, 2020

BUG: numpy.einsum indexing arrays now accept numpy int type

cadc292

See numpy#15961

seberg closed this as completed in #16080 May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

orichardson commented Apr 12, 2020 •

edited

Loading

eric-wieser commented Apr 12, 2020

eric-wieser commented Apr 13, 2020 •

edited

Loading

guilhermeleobas commented Apr 14, 2020

tinaoberoi commented Apr 21, 2020

guilhermeleobas commented Apr 21, 2020

tinaoberoi commented Apr 22, 2020

rlintott commented Apr 25, 2020 •

edited

Loading

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

Einsum indexing very fragile, because it tests for int (and int64 is not int) #15961

Comments

orichardson commented Apr 12, 2020 • edited Loading

Reproducing code example:

Error message:

Numpy/Python version information:

eric-wieser commented Apr 12, 2020

eric-wieser commented Apr 13, 2020 • edited Loading

guilhermeleobas commented Apr 14, 2020

tinaoberoi commented Apr 21, 2020

guilhermeleobas commented Apr 21, 2020

tinaoberoi commented Apr 22, 2020

rlintott commented Apr 25, 2020 • edited Loading

orichardson commented Apr 12, 2020 •

edited

Loading

eric-wieser commented Apr 13, 2020 •

edited

Loading

rlintott commented Apr 25, 2020 •

edited

Loading