Skip to content

BUG: to_datetime with unit with Int64 #30241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 2, 2020

Conversation

AlexKirko
Copy link
Member

Test output:

$ py.test pandas/tests/tools/test_datetime.py
============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-5.3.1, py-1.8.0, pluggy-0.13.0
rootdir: C:\git_contrib\pandas\pandas, inifile: setup.cfg
plugins: hypothesis-4.50.6, cov-2.8.1, forked-1.1.2, xdist-1.30.0
collected 1 item

pandas\tests\tools\test_datetime.py .                                    [100%]

============================== 1 passed in 0.03s ==============================

Notes: Hello. The fix introduces one additional type check if the Series passed to to_datetime stores anything except and IntegerArray. It does nothing else in this case. If we try to pass an IntegerArray, it pulls all the values that aren't NaN into an ndarray, converts that into datetime and adds NaT in the places where NaNs were. No precision is lost.

@jreback jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Datetime Datetime data dtype labels Dec 13, 2019
@AlexKirko AlexKirko requested a review from jreback December 15, 2019 06:33
@jreback
Copy link
Contributor

jreback commented Dec 15, 2019

@AlexKirko the diff is super confusing because you are pulling on other commits, pls merge in master and repush

@AlexKirko
Copy link
Member Author

AlexKirko commented Dec 16, 2019

@AlexKirko the diff is super confusing because you are pulling on other commits, pls merge in master and repush

@jreback Apologies for the mess. Still learning how to contribute properly. Merged master in.

@AlexKirko
Copy link
Member Author

@jreback Conflicts in whatsnew removed. Should be ready for another review.

@AlexKirko
Copy link
Member Author

@jreback Merged in master again to remove new conflicts in datetime.py

@jreback jreback changed the title Fix to datetime BUG: to_datetime with unit with Int64 Dec 24, 2019
@AlexKirko AlexKirko force-pushed the FIX-to-datetime branch 2 times, most recently from 34a0189 to 96c8ad8 Compare December 29, 2019 09:08
@AlexKirko
Copy link
Member Author

AlexKirko commented Dec 29, 2019

Moved the nan-handling to tslib. Web_and_Docs PR test is broken for now (# 30527), but other than that, things look ready for a review.

@jreback , did you have something like this solution in mind?

@AlexKirko AlexKirko requested a review from jreback December 29, 2019 11:52
@AlexKirko
Copy link
Member Author

AlexKirko commented Dec 30, 2019

@jreback Merged in master to fix Webs_and_Docs PR test bug.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (yes this was the approach was looking for). some small comments, ping on green.

@@ -297,16 +297,32 @@ def format_array_from_datetime(ndarray[int64_t] values, object tz=None,


def array_with_unit_to_datetime(ndarray values, object unit,
str errors='coerce'):
str errors='coerce', ndarray mask=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this after values, and require it

# because it expects an ndarray argument
if isinstance(arg, IntegerArray):
# Explicitly pass NaT mask to array_with_unit_to_datetime
mask_na = arg.isna()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this mask

# Explicitly pass NaT mask to array_with_unit_to_datetime
mask_na = arg.isna()
arg = getattr(arg, "_ndarray_values", arg)
result, tz_parsed = tslib.array_with_unit_to_datetime(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this outside the if/else and add mask=None for the non-IntegerArray case, then you can just call this once

@jreback jreback added this to the 1.0 milestone Dec 30, 2019
@AlexKirko AlexKirko requested a review from jreback December 30, 2019 15:42
@AlexKirko
Copy link
Member Author

@jreback Addressed all the comments, please take a look.

if isinstance(arg, IntegerArray):
# Explicitly pass NaT mask to array_with_unit_to_datetime
mask = arg.isna()
arg = getattr(arg, "_ndarray_values", arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why getattr instead of arg._ndarray_values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have IntegerArray.to_numpy, I think this could be something like arg.to_numpy("i8", fill_value=NPY_NAT).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes would like to do this exactly (or a way to get data/mask publicly would be ok here as well). but what you are suggesting is nice.

@pep8speaks
Copy link

pep8speaks commented Dec 30, 2019

Hello @AlexKirko! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-01 08:03:42 UTC

@AlexKirko
Copy link
Member Author

AlexKirko commented Dec 30, 2019

@jreback @TomAugspurger Added to the test and removed unnecessary getattr. Awaiting the decision on where we should implement NaN-handling: in tslib or in datetimes. I'm for tslib (current solution) as we merely expand on the way NaNs are handled and that's, in my opinion, is better than splitting and re-merging arg in _convert_listlike_datetimes.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, pls rebase and ping on green.

Stop relying on inner implementations of IntegerArray and numpy arrays.
Use public funcitons as much as possible.
@AlexKirko
Copy link
Member Author

@jreback rebased to updated master, squashed tiny commits, made the needed changes and commented on green

@AlexKirko AlexKirko requested a review from jreback January 1, 2020 08:55
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlexKirko, this looks good. Merging later today, in case @jreback wants to have one more look.

@TomAugspurger TomAugspurger merged commit b804372 into pandas-dev:master Jan 2, 2020
@TomAugspurger
Copy link
Contributor

Thanks @AlexKirko!

@AlexKirko AlexKirko deleted the FIX-to-datetime branch January 3, 2020 06:11
@AlexKirko
Copy link
Member Author

@jreback @TomAugspurger Thank you! Learned a lot about the process: it should help a lot with the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError in to_datetime when passing Int64 column
4 participants