Skip to content

astype CategoricalDtype on unknown categorical is still unknown #2947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Dec 1, 2017 · 4 comments
Closed

Comments

@TomAugspurger
Copy link
Member

When converting from an unknown categorical to a CategoricalDtype, we should be able to have known categories:

In [4]: a = pd.Series(['a', 'b', 'c'])

In [5]: b = dd.from_pandas(a, 2)

In [6]: b.astype('category').astype(pd.api.types.CategoricalDtype(['a', 'b', 'c']))
Out[6]:
Dask Series Structure:
npartitions=1
0    category[unknown]
2                  ...
dtype: category
Dask Name: astype, 3 tasks

Should be similar fix as to #2835

@jcrist
Copy link
Member

jcrist commented Dec 1, 2017

Looks like this is a bug in pandas, where .astype(CategoricalDtype(...)) on a categorical series is ignored:

In [1]: import pandas as pd

In [2]: a = pd.Series(['a', 'b', 'c'])

In [3]: dtype = pd.api.types.CategoricalDtype(['x', 'y', 'z'])

In [4]: b = a.astype('category')

In [5]: b
Out[5]:
0    a
1    b
2    c
dtype: category
Categories (3, object): [a, b, c]

In [6]: b.astype(dtype)
Out[6]:
0    a
1    b
2    c
dtype: category
Categories (3, object): [a, b, c]

@TomAugspurger
Copy link
Member Author

Whoops, thanks :)

@jcrist
Copy link
Member

jcrist commented Dec 1, 2017

pandas-dev/pandas#18593

@jcrist
Copy link
Member

jcrist commented Dec 21, 2017

Since pandas-dev/pandas#18593 is resolved, do you want to close this? For older versions of pandas, the code above won't work for either pandas or dask, for newer versions it works for both. I think this should be fine, as we try to mimic the behavior of pandas - users shouldn't rely on non-pandas behavior to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants