Skip to content

BUG: qcut fails with Float64Dtype #40730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
tamargrey opened this issue Apr 1, 2021 · 2 comments
Closed
2 of 3 tasks

BUG: qcut fails with Float64Dtype #40730

tamargrey opened this issue Apr 1, 2021 · 2 comments
Labels
Bug cut cut, qcut NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@tamargrey
Copy link

tamargrey commented Apr 1, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

series = pd.Series([1.0, 2.0, 3.0, 4.4], dtype=pd.Float64Dtype())
pd.qcut(series, 2)

Problem description

pd.qcut currently accepts the nullable Int64Dtype as well as 'float64', so I would expect it to work with the Float64Dtype as well. Instead the following error is produced:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-29-1db98f70db38> in <module>
      1 series = pd.Series([1.0,2.0,3.0,4.0], dtype=pd.Float64Dtype())
----> 2 pd.qcut(series, 2)

~/.pyenv/versions/3.8.2/envs/woodwork/lib/python3.8/site-packages/pandas/core/reshape/tile.py in qcut(x, q, labels, retbins, precision, duplicates)
    356         quantiles = q
    357     bins = algos.quantile(x, quantiles)
--> 358     fac, bins = _bins_to_cuts(
    359         x,
    360         bins,

~/.pyenv/versions/3.8.2/envs/woodwork/lib/python3.8/site-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    408 
    409     if include_lowest:
--> 410         ids[x == bins[0]] = 1
    411 
    412     na_mask = isna(x) | (ids == len(bins)) | (ids == 0)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Expected Output

Should match that of float64

0    (0.999, 2.5]
1    (0.999, 2.5]
2      (2.5, 4.0]
3      (2.5, 4.0]
dtype: category
Categories (2, interval[float64]): [(0.999, 2.5] < (2.5, 4.0]]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Sun Jul 5 00:43:10 PDT 2020; root:xnu-6153.141.1~9/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 41.2.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.7
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@tamargrey tamargrey added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2021
@jorisvandenbossche
Copy link
Member

@tamargrey Thanks for the report!

This was fixed last year for the nullable Int64Dtype in #31440, but it seems we need to generalize the fix to also cover Float64Dtype.

In this location:

# To support cut and qcut for IntegerArray we convert to float dtype.
# Will properly support in the future.
# https://github.com/pandas-dev/pandas/pull/31290
# https://github.com/pandas-dev/pandas/issues/31389
elif is_extension_array_dtype(x.dtype) and is_integer_dtype(x.dtype):
x = x.to_numpy(dtype=np.float64, na_value=np.nan)

we need to change the is_integer_dtype to generally catch nullable numeric dtypes

@jorisvandenbossche jorisvandenbossche added cut cut, qcut NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2021
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Apr 1, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Apr 26, 2021
@lithomas1
Copy link
Member

Fixed by #40969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug cut cut, qcut NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

4 participants