Skip to content

BUG/EA: groupby on an EA should return the EA type #23227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Oct 18, 2018 · 4 comments · Fixed by #23318
Closed

BUG/EA: groupby on an EA should return the EA type #23227

jreback opened this issue Oct 18, 2018 · 4 comments · Fixed by #23318
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Oct 18, 2018

In [1]: df = pd.DataFrame({'Int': pd.Series([1, 2, 3], dtype='Int64'), 'A': [1, 2, 1]})
   ...: df
   ...: 
Out[1]: 
  Int  A
0   1  1
1   2  2
2   3  1

In [2]: df.groupby('A').Int.sum()
Out[2]: 
A
1    4
2    2
Name: Int, dtype: int64

[2] should be of type Int64; IOW this aggregation needs to be passed thru to the EA _from_sequence

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 18, 2018
@jreback jreback added this to the Contributions Welcome milestone Oct 18, 2018
@5hirish
Copy link
Contributor

5hirish commented Oct 23, 2018

@jreback Hi working on this issue; did some code walkthrough, learned that here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L834 in the method _python_agg_general at result, counts = self.grouper.agg_series(obj, f) the obj type is Int64 but the result type is int64. So, I checked the agg_series method, seems like a Cython converter converts it from object to int64 after calculating the aggregation result. So, I am kind of lost as to where I should put _from_squence after checking if it's of type ExtensionArray

@jreback
Copy link
Contributor Author

jreback commented Oct 23, 2018

so look at _wrap_aggregated_output

there are a number of paths here though

@5hirish
Copy link
Contributor

5hirish commented Oct 25, 2018

@jreback

  1. Tests under pytest pandas/tests/arrays/ like test_integer.py/test_preserve_dtypes, test_integer.py/test_reduce_to_float are failing because:
AssertionError: Attributes are different
Attribute "dtype" are different
     [left]:  Int64      <-- result
     [right]: float64  <-- expected

there is custom datatype Int64 in the test input data of the above tests and test expects it to be converted to int64 or float64. Does pandas consider Int64 as int64, cause Int64 satisfies both the following conditions: if is_extension_array_dtype(dtype): and if numeric_only and is_numeric_dtype(dtype) or not numeric_only ?

  1. Tests under pandas/tests/sparse/ like test_groupby.py/test_groupby_includes_fill_value are failing because:
AssertionError: Attributes are different           
Attribute "dtype" are different
    [left]:  Sparse[float64, nan]   <-- result
    [right]: float64                       <-- expected

@TomAugspurger
Copy link
Contributor

@5hirish probably best to keep the conversation in the PR now that it's open.

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants