Skip to content

BUG: mean/median with strings #52281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 24, 2023
Merged

Conversation

jbrockmendel
Copy link
Member

I'm not wild about this multi-pass implementation. Longer-term I think we need to re-write a lot of nanops to be single-pass (zero|few)-copy.

@mroeschke
Copy link
Member

As an aside, it feels like a non-explicit operation that strings are coerced to numeric before calling mean/median. Would be nice if it was deprecation in favor of the user explicitly casting to a numeric type

@jbrockmendel
Copy link
Member Author

As an aside, it feels like a non-explicit operation that strings are coerced to numeric before calling mean/median. Would be nice if it was deprecation in favor of the user explicitly casting to a numeric type

That could work for mean (though eg Decimal objects may be tricky until we can tell infer those to pyarrow decimal). For median i actually think it can work on object dtype without casting (though may be slower), saving that for another day.

Big picture I'd like to refactor a lot of nanops to be no-copy and share code with libgroupby.

@jbrockmendel
Copy link
Member Author

Worth moving forward here or do we need to wait for a different approach?

@mroeschke
Copy link
Member

mroeschke commented Apr 20, 2023

Could the behavior of strings automatically being coerced to numbers during reductions be deprecated instead of supporting them?

@jbrockmendel
Copy link
Member Author

Could the behavior of strings automatically being coerced to numbers during reductions be deprecated instead of supporting them?

I'm not clear on how that is different from what this does (aside from calling it a bugfix instead of a deprecation)

@mroeschke
Copy link
Member

Ah okay sorry. Could you mention in the whatsnew that this raises a TypeError now?

@mroeschke mroeschke added this to the 2.1 milestone Apr 24, 2023
@mroeschke mroeschke added Strings String extension data type and string data Reduction Operations sum, mean, min, max, etc. labels Apr 24, 2023
@mroeschke mroeschke merged commit 829444a into pandas-dev:main Apr 24, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the bug-mean-strings branch April 30, 2023 17:43
Rylie-W pushed a commit to Rylie-W/pandas that referenced this pull request May 19, 2023
* BUG: converting string to numeric in median, mean

* whatsnew, median test

* troubleshoot builds

* fix arraymanager build

* say in whatsnew we raise TypeError

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
* BUG: converting string to numeric in median, mean

* whatsnew, median test

* troubleshoot builds

* fix arraymanager build

* say in whatsnew we raise TypeError

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reduction Operations sum, mean, min, max, etc. Strings String extension data type and string data
Projects
None yet
2 participants