-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: to_datetime with unit with Int64 #30241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@AlexKirko the diff is super confusing because you are pulling on other commits, pls merge in master and repush |
@jreback Apologies for the mess. Still learning how to contribute properly. Merged master in. |
@jreback Conflicts in whatsnew removed. Should be ready for another review. |
@jreback Merged in master again to remove new conflicts in |
34a0189
to
96c8ad8
Compare
Moved the nan-handling to tslib. Web_and_Docs PR test is broken for now (# 30527), but other than that, things look ready for a review. @jreback , did you have something like this solution in mind? |
@jreback Merged in master to fix Webs_and_Docs PR test bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm (yes this was the approach was looking for). some small comments, ping on green.
pandas/_libs/tslib.pyx
Outdated
@@ -297,16 +297,32 @@ def format_array_from_datetime(ndarray[int64_t] values, object tz=None, | |||
|
|||
|
|||
def array_with_unit_to_datetime(ndarray values, object unit, | |||
str errors='coerce'): | |||
str errors='coerce', ndarray mask=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this after values, and require it
pandas/core/tools/datetimes.py
Outdated
# because it expects an ndarray argument | ||
if isinstance(arg, IntegerArray): | ||
# Explicitly pass NaT mask to array_with_unit_to_datetime | ||
mask_na = arg.isna() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call this mask
pandas/core/tools/datetimes.py
Outdated
# Explicitly pass NaT mask to array_with_unit_to_datetime | ||
mask_na = arg.isna() | ||
arg = getattr(arg, "_ndarray_values", arg) | ||
result, tz_parsed = tslib.array_with_unit_to_datetime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this outside the if/else and add mask=None
for the non-IntegerArray case, then you can just call this once
@jreback Addressed all the comments, please take a look. |
pandas/core/tools/datetimes.py
Outdated
if isinstance(arg, IntegerArray): | ||
# Explicitly pass NaT mask to array_with_unit_to_datetime | ||
mask = arg.isna() | ||
arg = getattr(arg, "_ndarray_values", arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why getattr instead of arg._ndarray_values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have IntegerArray.to_numpy, I think this could be something like arg.to_numpy("i8", fill_value=NPY_NAT)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes would like to do this exactly (or a way to get data/mask publicly would be ok here as well). but what you are suggesting is nice.
Hello @AlexKirko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-01 08:03:42 UTC |
@jreback @TomAugspurger Added to the test and removed unnecessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, pls rebase and ping on green.
7634487
to
7fad79d
Compare
Stop relying on inner implementations of IntegerArray and numpy arrays. Use public funcitons as much as possible.
7fad79d
to
620506d
Compare
@jreback rebased to updated master, squashed tiny commits, made the needed changes and commented on green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AlexKirko, this looks good. Merging later today, in case @jreback wants to have one more look.
Thanks @AlexKirko! |
@jreback @TomAugspurger Thank you! Learned a lot about the process: it should help a lot with the next PR. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Test output:
Notes: Hello. The fix introduces one additional type check if the Series passed to
to_datetime
stores anything except and IntegerArray. It does nothing else in this case. If we try to pass an IntegerArray, it pulls all the values that aren't NaN into an ndarray, converts that into datetime and adds NaT in the places where NaNs were. No precision is lost.