-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: add shortcut to Timestamp constructor #30676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6e70d01
9aa0156
7cebba5
7aea539
d2adb4f
3ea3f91
2186358
a195518
23f5a44
c80f748
9065888
ba19e26
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -391,7 +391,18 @@ class Timestamp(_Timestamp): | |
# User passed tzinfo instead of tz; avoid silently ignoring | ||
tz, tzinfo = tzinfo, None | ||
|
||
if isinstance(ts_input, str): | ||
# GH 30543 if pd.Timestamp already passed, return it | ||
# check that only ts_input is passed | ||
# checking verbosely, because cython doesn't optimize | ||
# list comprehensions (as of cython 0.29.x) | ||
if (isinstance(ts_input, Timestamp) and freq is None and | ||
tz is None and unit is None and year is None and | ||
month is None and day is None and hour is None and | ||
minute is None and second is None and | ||
microsecond is None and nanosecond is None and | ||
tzinfo is None): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jbrockmendel This appears the way to go if we need maximum performance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure all of these extra checks are worth adding to improve perf when passing a Timestamp to a Timestamp constructor - @AlexKirko how often are you expecting that to actually happen? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For internal usage there are a lot of places where we do:
That's not exactly the usage being checked here, but could benefit in the same way from an optimized no-other-arguments-passed check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Admittedly, I don't know the library well enough to comment on internal usage, but @jbrockmendel has already done that. However, what I've done repeatedly in my own projects when on a deadline is take a You tend to rely on being able to cut corners when working with a well-supported package, and, currently, calling Timestamp on a Timestamp is more than 10 000 times slower than the proposed shortcut, which can be a nasty shock for someone expecting to just blitz through type conversions during data wrangling. @WillAyd We could make the code a bit more compact with what was proposed in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a comment to the effect of "we do this verbose thing because cython wont optimize a list comprehension (as of cython 0.29.x)" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jbrockmendel Added the comment you requested. |
||
return ts_input | ||
elif isinstance(ts_input, str): | ||
# User passed a date string to parse. | ||
# Check that the user didn't also pass a date attribute kwarg. | ||
if any(arg is not None for arg in _date_attributes): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,6 @@ | |
Tests for DatetimeIndex timezone-related methods | ||
""" | ||
from datetime import date, datetime, time, timedelta, tzinfo | ||
from distutils.version import LooseVersion | ||
|
||
import dateutil | ||
from dateutil.tz import gettz, tzlocal | ||
|
@@ -11,7 +10,6 @@ | |
import pytz | ||
|
||
from pandas._libs.tslibs import conversion, timezones | ||
from pandas.compat._optional import _get_version | ||
import pandas.util._test_decorators as td | ||
|
||
import pandas as pd | ||
|
@@ -583,15 +581,7 @@ def test_dti_construction_ambiguous_endpoint(self, tz): | |
["US/Pacific", "shift_forward", "2019-03-10 03:00"], | ||
["dateutil/US/Pacific", "shift_forward", "2019-03-10 03:00"], | ||
["US/Pacific", "shift_backward", "2019-03-10 01:00"], | ||
pytest.param( | ||
"dateutil/US/Pacific", | ||
"shift_backward", | ||
"2019-03-10 01:00", | ||
marks=pytest.mark.xfail( | ||
LooseVersion(_get_version(dateutil)) < LooseVersion("2.7.0"), | ||
reason="GH 31043", | ||
), | ||
), | ||
["dateutil/US/Pacific", "shift_backward", "2019-03-10 01:00"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shortcut, along with the fix to #24329 makes this exception no longer necessary, as a correct value gets returned without an error. |
||
["US/Pacific", timedelta(hours=1), "2019-03-10 03:00"], | ||
], | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
Was this what you had in mind? I think we should leave only the scalar transformer for benchmarking this shortcut. Don't see much sense in adding the benchmarks for transforming a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect, thanks