-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: mean/median with strings #52281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As an aside, it feels like a non-explicit operation that strings are coerced to numeric before calling mean/median. Would be nice if it was deprecation in favor of the user explicitly casting to a numeric type |
That could work for mean (though eg Decimal objects may be tricky until we can tell infer those to pyarrow decimal). For median i actually think it can work on object dtype without casting (though may be slower), saving that for another day. Big picture I'd like to refactor a lot of nanops to be no-copy and share code with libgroupby. |
Worth moving forward here or do we need to wait for a different approach? |
Could the behavior of strings automatically being coerced to numbers during reductions be deprecated instead of supporting them? |
I'm not clear on how that is different from what this does (aside from calling it a bugfix instead of a deprecation) |
Ah okay sorry. Could you mention in the whatsnew that this raises a |
Thanks @jbrockmendel |
* BUG: converting string to numeric in median, mean * whatsnew, median test * troubleshoot builds * fix arraymanager build * say in whatsnew we raise TypeError --------- Co-authored-by: Matthew Roeschke <[email protected]>
* BUG: converting string to numeric in median, mean * whatsnew, median test * troubleshoot builds * fix arraymanager build * say in whatsnew we raise TypeError --------- Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I'm not wild about this multi-pass implementation. Longer-term I think we need to re-write a lot of nanops to be single-pass (zero|few)-copy.