-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: incorrect EA casting in groubpy.agg #38254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
99073b8
1ed9d8a
21b4f0f
56b42bb
e01487f
cf5bd53
02bffdb
f950c75
0544c8b
5a30b93
3410102
b481049
2047f3c
b96835f
7bc78eb
643fab6
eddb089
2572dda
5590ad7
5b73850
896a97a
5e611b2
961304f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -432,10 +432,13 @@ def test_agg_over_numpy_arrays(): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_agg_tzaware_non_datetime_result(): | ||
@pytest.mark.parametrize("as_period", [True, False]) | ||
def test_agg_tzaware_non_datetime_result(as_period): | ||
# discussed in GH#29589, fixed in GH#29641, operating on tzaware values | ||
# with function that is not dtype-preserving | ||
dti = pd.date_range("2012-01-01", periods=4, tz="UTC") | ||
if as_period: | ||
dti = dti.tz_localize(None).to_period("D") | ||
df = DataFrame({"a": [0, 0, 1, 1], "b": dti}) | ||
gb = df.groupby("a") | ||
|
||
|
@@ -454,6 +457,8 @@ def test_agg_tzaware_non_datetime_result(): | |
result = gb["b"].agg(lambda x: x.iloc[-1] - x.iloc[0]) | ||
expected = Series([pd.Timedelta(days=1), pd.Timedelta(days=1)], name="b") | ||
expected.index.name = "a" | ||
if as_period: | ||
expected = expected.astype(object) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the correct expected result? When subtracting Periods, you get offset objects, not Timedelta objects? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point. probably not great that tm.assert_series_equal can't tell the difference either |
||
tm.assert_series_equal(result, expected) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you doing this here, rather than inside maybe_cast_results itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I'm trying to get rid of maybe_cast_result altogether. It uses
maybe_cast_to_extension_array
whereas we should be casting yoda-style (do or do not)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as mentioned in the OP, id actually rather remove this chunk of code entirely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure but I'd rather NOT do it in groupby at all (any casting like this), and instead push it to dtypes/cast.py this seems like going backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you're good with just ripping this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on how we think of dtypes.cast. I think of it as "low-level helper functions related to casting" NOT "anything related to casting". I don't want to put groupby-specific casting code in there. (i also dont like having DataFrame.reset_index code in there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think virtually any casting code should be in groupby / frame. but we have to put it somewhere (and of course ideally there isn't any special casing on the class of the parent container).
so i think its the lesser of evils to keep it all together in dtypes/cast.py potentially allowing for re-use / refactoring.