Skip to content

DataFrame.iloc does not raise for setitem if empty #32896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SaturnFromTitan opened this issue Mar 21, 2020 · 8 comments
Closed

DataFrame.iloc does not raise for setitem if empty #32896

SaturnFromTitan opened this issue Mar 21, 2020 · 8 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version

Comments

@SaturnFromTitan
Copy link
Contributor

SaturnFromTitan commented Mar 21, 2020

This is a follow-up on #32886 (comment)

test_setitem_ndarray_3d in pandas/tests/indexing/test_indexing.py isn't raising for empty index, iloc and DataFrame parametrization.

The mentioned PR added a new xfail test for this combination: test_setitem_ndarray_3d_does_not_fail_for_iloc_empty_dataframe. The goal of this PR is to remove the xfail test and remove the associated pytest.skip in test_setitem_ndarray_3d.

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 27, 2020
@simonjayhawkins
Copy link
Member

This didn't silently fail on 0.25.3

>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'0.25.3'
>>>
>>> i = pd.Index([])
>>> obj = pd.DataFrame(np.random.randn(len(i), len(i)), index=i, columns=i)
>>> obj
Empty DataFrame
Columns: []
Index: []
>>>
>>> nd3 = np.random.randint(5, size=(2, 2, 2))
>>>
>>> obj.iloc[nd3] = 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 204, in __setitem__
    indexer = self._get_setitem_indexer(key)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 191, in _get_setitem_indexer
    return self._convert_to_indexer(key, axis=axis, is_setter=True)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2175, in _convert_to_indexer
    self._validate_key(obj, axis)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 2031, in _validate_key
    raise IndexError("positional indexers are out-of-bounds")
IndexError: positional indexers are out-of-bounds

922f932 is the first bad commit
commit 922f932
Author: jbrockmendel [email protected]
Date: Thu Feb 27 04:52:36 2020 -0800

PERF: pass through to numpy validation for iloc setitem (#32257)

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version and removed Bug labels Mar 30, 2020
@jbrockmendel
Copy link
Member

This matches the numpy behavior:

arr = np.array([]).reshape(0, 0)
nd3 = np.random.randint(5, size=(2, 2, 2))
arr[nd3] = 0

This may be a numpy bug as well cc @seberg ?

@seberg
Copy link
Contributor

seberg commented Apr 1, 2020

Yeah, I guess it technically is a NumPy bug. The reason here must be the additional empty dimension that is not being indexed, somehow causing the index check to never be run to begin with.
Can you open a NumPy issue about it? I am surprised there is a regression, this is probably the case at least since NumPy 1.9, but then I guess the regression is due to a change in how pandas does things internally?

@jbrockmendel
Copy link
Member

but then I guess the regression is due to a change in how pandas does things internally?

yes, we recently stopped doing some validation on iloc and instead fall through to let numpy do the validation

@seberg
Copy link
Contributor

seberg commented Apr 2, 2020

As a heads up, I am considering deprecating this for a release if that is OK? I agree that it is a straight bug, but apparently I intentionally put this in (although possibly not intentionally this exact case).
If it would be easier for you if I remove it right away, that would be a welcome vote though :) (may still decide to do that in any case).

@jbrockmendel
Copy link
Member

So we'll put back in a check and raise, and then when our min numpy version catches up to 1.19(?) we can go back to falling through? If so, sounds good.

@seberg
Copy link
Contributor

seberg commented Apr 2, 2020

Yeah, I suppose so. You could get away with only doing it when the result/assignment may be empty (if you want to avoid doing unnecessary/slow checks). If we deprecate, I guess it would be 1.20 otherwise 1.19, but since you need backcompat anyway, I guess it makes no real difference.

@mroeschke mroeschke added the Bug label May 11, 2020
@phofl
Copy link
Member

phofl commented Nov 22, 2020

cc @jbrockmendel This raises since cd649a2

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 145, in <module>
    obj.iloc[nd3] = 0
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 687, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1630, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1646, in _setitem_with_indexer_split_path
    raise ValueError(r"Cannot set values with ndim > 2")
ValueError: Cannot set values with ndim > 2

Can we close this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

6 participants