Skip to content

BUG: loc with empty multiindex raises exception #38711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 28, 2020

Conversation

kasim95
Copy link
Contributor

@kasim95 kasim95 commented Dec 27, 2020

@kasim95
Copy link
Contributor Author

kasim95 commented Dec 27, 2020

Added test case for the changes made in GH#36936

@kasim95 kasim95 changed the title BUG: Added test cases for GH#36936 TST: Added test cases for GH#36936 Dec 27, 2020
df.loc[df.loc[df.loc[:, "value"] == 0].index, "value"] = 5
result = df
expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"])
tm.assert_equal(result, expected)
Copy link
Member

@MarcoGorelli MarcoGorelli Dec 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does tm.assert_frame_equal not work here? (and above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion method is changed to tm.assert_frame_equal in the most recent commit.

@jreback jreback changed the title TST: Added test cases for GH#36936 BUG: loc with empty multiindex raises exception Dec 27, 2020
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Testing pandas testing functions or related to the test suite labels Dec 27, 2020
# loc on empty multiindex == loc with False mask
empty_multiindex = df.loc[df.loc[:, "value"] == 0, :].index
result = df.loc[empty_multiindex, :]
expected = df.loc[[False] * len(df.index), :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you construct the expected frame explicityly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing an empty DataFrame named expected with Multiindex assigns the expected.index.inferred_type attribute to object by default.
The value of result.index.inferred_type attribute is string as inferred from the index variable created earlier.
This fails the tm.assert_frame_equal test

I used the following code to create an empty dataframe with multiindex:

def test_loc_empty_multiindex():
    # GH#36936
    arrays = [["a", "a", "b", "a"], ["a", "a", "b", "b"]]
    index = MultiIndex.from_arrays(arrays, names=("idx1", "idx2"))
    df = DataFrame([1, 2, 3, 4], index=index, columns=["value"])

    # loc on empty multiindex == loc with False mask
    empty_multiindex = df.loc[df.loc[:, "value"] == 0, :].index
    result = df.loc[empty_multiindex, :]
    index2 = MultiIndex(levels=[[], []], codes=[[], []], names=["idx1", "idx2"])
    expected = DataFrame([], index=index2, columns=["value"], dtype="string")
    expected = expected.astype({"value": "int"})
    tm.assert_frame_equal(result, expected) # this test fails

    # replacing value with loc on empty multiindex
    df.loc[df.loc[df.loc[:, "value"] == 0].index, "value"] = 5
    result = df
    expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"])
    tm.assert_frame_equal(result, expected)

If the index variable is used directly to create the empty dataframe, it introduces NaN values

@@ -695,3 +695,22 @@ def test_loc_getitem_index_differently_ordered_slice_none():
columns=["a", "b"],
)
tm.assert_frame_equal(result, expected)


def test_loc_empty_multiindex():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you co-locate this with similar tests

jreback@lhs2:~/pandas-dev$ grep empty pandas/tests/indexing/multiindex/*
pandas/tests/indexing/multiindex/test_getitem.py:def test_frame_getitem_multicolumn_empty_level():
pandas/tests/indexing/multiindex/test_getitem.py:def test_frame_mi_empty_slice():
pandas/tests/indexing/multiindex/test_loc.py:        empty = Series(data=[], dtype=np.float64)
pandas/tests/indexing/multiindex/test_loc.py:        result = x.loc[empty]
pandas/tests/indexing/multiindex/test_loc.py:        # empty array:
pandas/tests/indexing/multiindex/test_loc.py:        empty = np.array([])
pandas/tests/indexing/multiindex/test_loc.py:        result = x.loc[empty]
pandas/tests/indexing/multiindex/test_loc.py:        ([], []),  # empty ok
pandas/tests/indexing/multiindex/test_loc.py:def test_loc_getitem_duplicates_multiindex_empty_indexer(columns_indexer):
pandas/tests/indexing/multiindex/test_loc.py:    # empty indexer

IOW there are 2 tests in the test_getitem.py that are testing basically the same, so locate there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the test to pandas/tests/indexing/multiindex/test_getitem.py

@jreback jreback added this to the 1.3 milestone Dec 28, 2020
@jreback jreback merged commit fc2cc7c into pandas-dev:master Dec 28, 2020
@jreback
Copy link
Contributor

jreback commented Dec 28, 2020

thanks @kasim95

@kasim95 kasim95 deleted the BUG36936 branch December 29, 2020 02:16
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: loc with empty multiindex raises exception
3 participants