Skip to content

BUG: loc-indexing with a CategoricalIndex with non-string categories #29922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

topper-123
Copy link
Contributor

Bug when indexing with .loc and the index is a CategoricalIndex with integer or float categories.

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 49eb424 to 5167024 Compare November 28, 2019 23:03
@pep8speaks
Copy link

pep8speaks commented Nov 28, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-11 07:13:21 UTC

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 5167024 to 364f61e Compare November 28, 2019 23:05
try:
return self.categories._convert_scalar_indexer(key, kind=kind)
except TypeError:
self._invalid_indexer("label", key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this TypeError hit?

Copy link
Contributor Author

@topper-123 topper-123 Nov 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was hit when you have pd.Series([1,2], index=pd.CategoricalIndex(["a", "b"])).loc[1], i.e. wrong indexing type. It's the same error type as in s = pd.Series([1,2], index=["a", "b"]).loc[1].

I've added a test for it.

@jreback jreback added Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 29, 2019
@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch 2 times, most recently from 4751ca7 to f2274d4 Compare November 30, 2019 11:38
@jreback jreback added this to the 1.0 milestone Dec 1, 2019
if self.categories._defer_to_indexing:
return self.categories._convert_scalar_indexer(key, kind=kind)

if kind == "loc" or self.categories._defer_to_indexing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might not need the _defer_to_indexing any longer, as I think the only way to get here is kind=='loc'

@topper-123
Copy link
Contributor Author

Ill be abroad untill the weekend, will look at it then. I think it would cause a error with categories made from datetimeindex, but I’ll check.

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch 3 times, most recently from e9f5832 to 746d6d4 Compare December 7, 2019 23:29
@topper-123
Copy link
Contributor Author

I`ve updated the PR: I´ve removed _defer_to_indexing and added some tests.

result = df.loc[idx_values[0]]
expected = Series(["foo"], index=["A"], name=idx_values[0])
tm.assert_series_equal(result, expected)
# list selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line between cases

np.array([1, 2, 3], dtype=dtype)
for dtype in [
np.int8,
np.int16,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way to use the fixtures (or definitions) in pandas/conftest.py for some of this

@topper-123 topper-123 force-pushed the categorical_indexing_with_non_string_categories branch from 746d6d4 to 7316414 Compare December 11, 2019 05:29
@topper-123
Copy link
Contributor Author

I've updated and use the dtypes from pandas/conftest.py instead.

[1.5, 2.5, 3.5],
[-1.5, -2.5, -3.5],
# numpy int/uint
*[np.array([1, 2, 3], dtype=dtype) for dtype in conftest.ALL_INT_DTYPES],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great! nice comprehensive test

@jreback jreback merged commit 27836e9 into pandas-dev:master Dec 11, 2019
@jreback
Copy link
Contributor

jreback commented Dec 11, 2019

thanks @topper-123 very nice!

@topper-123 topper-123 deleted the categorical_indexing_with_non_string_categories branch December 11, 2019 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't .loc[label] on a CategoricalIndex with labels being integer
3 participants