Skip to content

BUG: Different types of group keys when grouping over one or multiple columns #46659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
jankislinger opened this issue Apr 6, 2022 · 4 comments
Open
2 of 3 tasks
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby

Comments

@jankislinger
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

data = pd.DataFrame({"x": [1, 2], "y": [1, 2]})

for x, df in data.groupby("x"):
    print(type(x))  # <class 'int'>

for (x, y), df in data.groupby(["x", "y"]):
    print(type(x))  # <class 'numpy.int64'>

Issue Description

When iterating over groups from single integer column the iterator yields the group values as Python integers. When you iterate over groups from multiple columns the type remains np.int64.

Expected Behavior

I would expect both cases to have the same type

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-39-generic
Version : #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@jankislinger jankislinger added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2022
@GYHHAHA
Copy link
Contributor

GYHHAHA commented Apr 6, 2022

Thanks for report! @jankislinger Please make the title of this issue more specific. It will help others grasp your problem easier. But here I wonder that when will the current behavior lead to an error? Or is there a real situation you want np.int64 instead of int?

@jankislinger
Copy link
Author

I've been passing it to requests.post which internally calls complexjson.dumps and it was failing with error TypeError: Object of type int64 is not JSON serializable. If it were regular json.dumps that I'd be calling I could pass default convertor to string but this dumps call is quite deep in the call stack.

(sorry about the name, I didn't notice)

@jankislinger jankislinger changed the title BUG: BUG: Different types of group keys when grouping over one or multiple columns Apr 6, 2022
@monishkumar9677
Copy link

json does not recognize NumPy data types. Convert the number to a Python int before serializing the object.

         Here is the solution...

import pandas as pd
import json
df = pd.DataFrame({ "X": [1,2],"Y": [1,2]})
for x, df in data.groupby("x"):
print(type(x)) # <class 'int'>
for (x, y), df in df.groupby(["X","Y"]):
print(type(int(x))) # <class 'int'>
Note:
Now type is same and json acceptable type.
Tested code:
import pandas as pd
import json
df = pd.DataFrame({"X": [1,2],"Y": [1,2]})
for (x, y), df in df.groupby(["X","Y"]):
print(type(int(x))) # <class 'int'>
update_data = {
'x_value': int(x),
'y_value': int(y)
}
stringtype = json.dumps(update_data) #JSON serialization done
sollution in text.txt

@simonjayhawkins simonjayhawkins added Groupby API - Consistency Internal Consistency of API/Behavior and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2022
@simonjayhawkins
Copy link
Member

Thanks @jankislinger for the report.

Help on method __iter__ in module pandas.core.groupby.groupby:

__iter__() -> 'Iterator[tuple[Hashable, NDFrameT]]' method of pandas.core.groupby.generic.DataFrameGroupBy instance
    Groupby iterator.
    
    Returns
    -------
    Generator yielding sequence of (name, subsetted object)
    for each group

doesn't give an indication of which is correct, but yes the return types should be consistent. I guess returning Python types is more useful. (IIRC there are other issues discussing methods that should return Python types)

@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 10, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Groupby
Projects
None yet
Development

No branches or pull requests

5 participants