Skip to content

BUG: __getitem__ with boolean type #10607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
felixlaumon opened this issue Jul 17, 2015 · 5 comments
Closed

BUG: __getitem__ with boolean type #10607

felixlaumon opened this issue Jul 17, 2015 · 5 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@felixlaumon
Copy link

If a series contains boolean data, value_counts() do not return the count in the descending order. However the counts are correct if accessing through iloc. I supposed that is because 0 is cast into False and 1 is cast into True?

import pandas as pd

str_series = pd.Series(['f', 'f', 't', 't', 't'])
bool_series = pd.Series([False, False, True, True, True])

counts = bool_series.value_counts()
assert counts.index[0] == True
assert counts.index[1] == False
assert counts[0] == 3 # counts[0] is actually 2
assert counts[1] == 2 # counts[1] is actually 3

counts = bool_series.value_counts()
assert counts.index[0] == True
assert counts.index[1] == False
assert counts.iloc[0] == 3 # this works
assert counts.iloc[1] == 2 # this works

counts = str_series.value_counts()
assert counts.index[0] == 't'
assert counts.index[1] == 'f'
assert counts[0] == 3
assert counts[1] == 2
@sinhrks sinhrks added the Indexing Related to indexing on series/frames, not to indexes themselves label Jul 17, 2015
@sinhrks
Copy link
Member

sinhrks commented Jul 17, 2015

Thanks for the report. I understand the problem is unrelated to value_counts and sort, but simply referring to indexing against bool Index? Following is a simple example on my env:

s = pd.Series([3, 2], index=[True, False])
# True     3
# False    2
# dtype: int64

# OK
s[True]
# 3
s[False]
# 2

# NG, expects location based indexing?
s[0]
# 2
s[1]
# 3 

# currently, Index can't have bool dtype 
s.index.dtype
# dtype('O')

@felixlaumon
Copy link
Author

Yes it's really more about indexing with bool type index.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2015

yeh this is an edge case; could be addressed by using

In [1]: Index([True,False]).inferred_type
Out[1]: 'boolean'

in Series.__getitem__

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Jul 17, 2015
@jreback jreback added this to the Someday milestone Jul 17, 2015
@sinhrks
Copy link
Member

sinhrks commented Jul 17, 2015

Looked little, and it is likely to be caused by numpy. Will check whether it is intended or not.

import numpy as np
np.__version__
# '1.9.2'

a = np.array([0, 1, 2])
a[True]
# 1
a[False]
# 0

a[[True]]
# __main__:1: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
# array([1])
a[[False]]
# array([0])

# When we pass array, it works as boolean indexing
a[np.array([True])]
# array([0])
a[np.array([False])]
# array([], dtype=int64)

@jbrockmendel jbrockmendel removed the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 18, 2020
@mroeschke mroeschke added Bug Indexing Related to indexing on series/frames, not to indexes themselves and removed Dtype Conversions Unexpected or buggy dtype conversions labels Apr 18, 2021
@mroeschke mroeschke changed the title pd.Series value_counts() does not sort correctly for boolean type BUG: __getitem__ with boolean type Apr 18, 2021
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@mroeschke
Copy link
Member

It looks like this raises now and I believe we have tests for this so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants