-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Should be able to group by a categorical Series of unequal length #44179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is clearly documented: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.groupby.html?highlight=groupby#pandas.DataFrame.groupby I don't see any reason to expect this to align. |
Hi. Would you clarify how the documentation indicates that the series should not be aligned? As I read the excerpt below on the
In my two examples above under "Reproducible Example" and "Expected Behavior," the key difference is only the data type of the series; pandas correctly aligns the series with |
this is not what one would expect here. yes you can do this but i would expect an error to accidently align on non-sensical things. |
@stanwest i see you opened a PR, will look. |
Elaborating on the display of the aligned objects, if we assign aligned_series, aligned_grouper = series.align(grouper) then pandas currently will group the series by the given categories (dropping the missing categories) with either of the following: # Group by a Series with categorical data type and matching index.
aligned_series.groupby(aligned_grouper).groups
# Returns {'A': [0, 3], 'B': [1, 2]}
# Group by a Categorical with matching length.
aligned_series.groupby(aligned_grouper.values).groups
# Returns {'A': [0, 3], 'B': [1, 2]} As I understand the concepts and the documentation, |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
When grouping by a
Series
, the values of the series after alignment determine the groups. In the example, however, pandas does not attempt to align the series and instead unnecessarily wants it to have the same length as the axis of grouping.Expected Behavior
I expect
obj.groupby(grouper)
, wheregrouper
is aSeries
with a categorical data type, to allow the grouper to have unequal length and to align it, as in the following where the grouper has a non-categorical data type:Installed Versions
INSTALLED VERSIONS
commit : 9b2bb73
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.17134
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252
pandas : 1.4.0.dev0+952.g9b2bb732f0
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.1
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.10.0
fastparquet : 0.7.1
gcsfs : 2021.10.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 5.0.0
pyxlsb : None
s3fs : 2021.10.0
scipy : 1.7.1
sqlalchemy : 1.4.25
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
The text was updated successfully, but these errors were encountered: