BUG: Different types of group keys when grouping over one or multiple columns #46659

jankislinger · 2022-04-06T09:50:01Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

data = pd.DataFrame({"x": [1, 2], "y": [1, 2]})

for x, df in data.groupby("x"):
    print(type(x))  # <class 'int'>

for (x, y), df in data.groupby(["x", "y"]):
    print(type(x))  # <class 'numpy.int64'>

Issue Description

When iterating over groups from single integer column the iterator yields the group values as Python integers. When you iterate over groups from multiple columns the type remains np.int64.

Expected Behavior

I would expect both cases to have the same type

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-39-generic
Version : #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

The text was updated successfully, but these errors were encountered:

GYHHAHA · 2022-04-06T15:28:36Z

Thanks for report! @jankislinger Please make the title of this issue more specific. It will help others grasp your problem easier. But here I wonder that when will the current behavior lead to an error? Or is there a real situation you want np.int64 instead of int?

jankislinger · 2022-04-06T15:43:36Z

I've been passing it to requests.post which internally calls complexjson.dumps and it was failing with error TypeError: Object of type int64 is not JSON serializable. If it were regular json.dumps that I'd be calling I could pass default convertor to string but this dumps call is quite deep in the call stack.

(sorry about the name, I didn't notice)

monishkumar9677 · 2022-04-20T15:35:14Z

json does not recognize NumPy data types. Convert the number to a Python int before serializing the object.

         Here is the solution...

import pandas as pd
import json
df = pd.DataFrame({ "X": [1,2],"Y": [1,2]})
for x, df in data.groupby("x"):
print(type(x)) # <class 'int'>
for (x, y), df in df.groupby(["X","Y"]):
print(type(int(x))) # <class 'int'>
Note:
Now type is same and json acceptable type.
Tested code:
import pandas as pd
import json
df = pd.DataFrame({"X": [1,2],"Y": [1,2]})
for (x, y), df in df.groupby(["X","Y"]):
print(type(int(x))) # <class 'int'>
update_data = {
'x_value': int(x),
'y_value': int(y)
}
stringtype = json.dumps(update_data) #JSON serialization done
sollution in text.txt

simonjayhawkins · 2022-05-29T15:35:14Z

Thanks @jankislinger for the report.

Help on method __iter__ in module pandas.core.groupby.groupby:

__iter__() -> 'Iterator[tuple[Hashable, NDFrameT]]' method of pandas.core.groupby.generic.DataFrameGroupBy instance
    Groupby iterator.
    
    Returns
    -------
    Generator yielding sequence of (name, subsetted object)
    for each group

doesn't give an indication of which is correct, but yes the return types should be consistent. I guess returning Python types is more useful. (IIRC there are other issues discussing methods that should return Python types)

jankislinger added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2022

jankislinger changed the title ~~BUG:~~ BUG: Different types of group keys when grouping over one or multiple columns Apr 6, 2022

simonjayhawkins added Groupby API - Consistency Internal Consistency of API/Behavior and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2022

simonjayhawkins added this to the Contributions Welcome milestone Jun 10, 2022

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Different types of group keys when grouping over one or multiple columns #46659

BUG: Different types of group keys when grouping over one or multiple columns #46659

jankislinger commented Apr 6, 2022

INSTALLED VERSIONS

GYHHAHA commented Apr 6, 2022

jankislinger commented Apr 6, 2022

monishkumar9677 commented Apr 20, 2022

simonjayhawkins commented May 29, 2022

BUG: Different types of group keys when grouping over one or multiple columns #46659

BUG: Different types of group keys when grouping over one or multiple columns #46659

Comments

jankislinger commented Apr 6, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

GYHHAHA commented Apr 6, 2022

jankislinger commented Apr 6, 2022

monishkumar9677 commented Apr 20, 2022

simonjayhawkins commented May 29, 2022