-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix nanosecond timedeltas #45108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix nanosecond timedeltas #45108
Conversation
…the timedelta constructor BUG: overriden the Timedelta.total_seconds() to return the correct value containing also the nanoseconds portion
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -1268,7 +1268,16 @@ class Timedelta(_Timedelta): | |||
|
|||
kwargs = {key: _to_py_int_float(kwargs[key]) for key in kwargs} | |||
|
|||
nano = convert_to_timedelta64(kwargs.pop('nanoseconds', 0), 'ns') | |||
# GH43764, making sure any nanoseconds contributions from any kwarg is taken into consideration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm this is just duplicating code from below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly which code you're referring to, but if I do not catch all nanoseconds contributions when a Timedelta
instance is prepared using any of the "microseconds", "milliseconds" and "seconds" kwargs or any combination therefrom, the potential nanosecond contributions from every of those kwargs must be caught here, otherwise it will be lost information, e.g. Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)
.
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -1520,6 +1529,10 @@ class Timedelta(_Timedelta): | |||
div = other // self | |||
return div, other - div * self | |||
|
|||
# GH40946 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc string and types pls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done on premise. (Depends on the decision to separate the issues in two PRs or not)
pandas/_libs/tslibs/timedeltas.pyx
Outdated
@@ -1520,6 +1529,10 @@ class Timedelta(_Timedelta): | |||
div = other // self | |||
return div, other - div * self | |||
|
|||
# GH40946 | |||
def total_seconds(self): | |||
return self.value / 1e9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internally, ns precision is achieved by storing the value as a 64 bit integer.
once this is converted to a float, we cannot maintain ns precision. However, including the nanosecond component is probably more accurate than ignoring it.
I think we need more discussion on the issue before re-instating the nanosecond "precision".
from #31380 (comment)
... from "dubiously nanosecond-precision" to "microsecond-precision" ...
The change to ms precision was an intentional one but was not documented.
Changing it back would probably require a specific mention in the release note.
Also the tests are failing for the cases added when the precision of total_seconds
changed, so this needs to be fixed also.
I also think that the 2 issue should be addressed independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I go and refactor/enhance the tests, it would be nice to have a categorical decision on this topic. Should the total_seconds
part be a separate PR? I am not really sure why the nanosecond precision is a problem. The precise epoch in ns can be obtained by the value
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the
total_seconds
part be a separate PR?
the two fixes are orthogonal? The override of the total_seconds
method to fix #40946 and the change to __new__
to fix the constructor issue #43764 (comment)?
If so, I think should be two separate PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls split to a sepearate PR
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -236,6 +236,8 @@ Other enhancements | |||
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`) | |||
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`). | |||
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`) | |||
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`) | |
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue:`43764`) |
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -236,6 +236,8 @@ Other enhancements | |||
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`) | |||
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`). | |||
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`) | |||
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`) | |||
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`) | |
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue:`40946`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@simonjayhawkins no more PRs on 1.4 actively trying to reduce the number. we can certainly decide later to backport or merge on the release. |
sure. we had the discussion on the last release about the milestone and adopted the blocker for rc label. Admittedly, I don't recall if that allowed adding new issues/prs to the milestone. |
right but the point is i don't want to label these with a version until we are ready to merge (otherwise these tend to stick for no good reason) |
pandas/_libs/tslibs/timedeltas.pyx
Outdated
nano = convert_to_timedelta64(kwargs.pop('nanoseconds', 0), 'ns') | ||
# GH43764, making sure any nanoseconds contributions from any kwarg | ||
# is taken into consideration | ||
nano = convert_to_timedelta64( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are entirely duplicating L1283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My modifications removed "microseconds" till "seconds" from kwargs
. Any other potential kwarg
, such as year
etc, are handled the same as the previous way, namely via L1283. Do you have other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually don't have a problem with doing what you are doing but then you need to entirely remove L1282-1284 and handle all the kwargs
(well you still need the try/except).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will also slow this function down a lot. rather than poping be explicit on the argument handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check the last changes? I profiled instantiation time and it is taking about 50% the time as in master.
def test():
pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)
pd.Timedelta(minutes=-7)
pd.Timedelta(seconds=1234e-9)
pd.Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)
pd.Timedelta(days=1, seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)
pandas/_libs/tslibs/timedeltas.pyx
Outdated
+ kwargs.pop('microseconds', 0) * 1000 | ||
+ kwargs.pop('milliseconds', 0) * 1000000 | ||
+ kwargs.pop('seconds', 0) * 1000000000 | ||
), 'ns' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because these are floats, I assume that this will lose precision near the limits. i.e. seconds = 106751 (days) * 24*3600 with a nanosecond component specified too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure, but I would expect that when aiming at precision the inputs should instead be coded as integers by the caller? My motivation here is only to make the instantiation consistent in that, if float inputs are allowed, then any nanosecond contributions from any supplied kwargs
must be taken into consideration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, if any of these components is a float.
so pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)
gives Timedelta('106751 days 00:00:00')
whereas on master we get Timedelta('106751 days 00:00:00.000000001')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check my last commit? It produces the same output as in master
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls also add a whatsnew in 1.4. bug fixes in the datetime section
pandas/_libs/tslibs/timedeltas.pyx
Outdated
raise ValueError( | ||
"cannot construct a Timedelta from the passed arguments, " | ||
"allowed keywords are " | ||
"[weeks, days, hours, minutes, seconds, " | ||
"milliseconds, microseconds, nanoseconds]" | ||
) | ||
|
||
# GH43764, making sure any nanoseconds contributions from any kwarg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a useful comment as its not relevant for a current reader. However a comment explaining what you are doing would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/_libs/tslibs/timedeltas.pyx
Outdated
) * 1_000_000_000 | ||
) | ||
|
||
value = convert_to_timedelta64( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just directly create a np.timedelta64(ts, "ns")
no reason to go thru the routine when we know exactly what we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Oversaw that. Performance is even better now.
DOC: moved the whatsnew entry to the timedelta section and added new comment
thanks @deponovo |
value = nano + convert_to_timedelta64(timedelta(**kwargs), | ||
'ns') | ||
except TypeError as e: | ||
if not cls._req_any_kwargs_new.intersection(kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means we now fail to raise on pd.Timedelta(days=2, foo=9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to create a PR for this..
Fixes the construction of
Timedelta
objects with any nanosecond contribution.