Skip to content

BUG: Fix nanosecond timedeltas #45108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 31, 2021

Conversation

deponovo
Copy link
Contributor

@deponovo deponovo commented Dec 29, 2021

Fixes the construction of Timedelta objects with any nanosecond contribution.

…the timedelta constructor

BUG: overriden the Timedelta.total_seconds() to return the correct value containing also the nanoseconds portion
@deponovo deponovo changed the title Fix nanosecond timedeltas BUG: Fix nanosecond timedeltas Dec 29, 2021
@@ -1268,7 +1268,16 @@ class Timedelta(_Timedelta):

kwargs = {key: _to_py_int_float(kwargs[key]) for key in kwargs}

nano = convert_to_timedelta64(kwargs.pop('nanoseconds', 0), 'ns')
# GH43764, making sure any nanoseconds contributions from any kwarg is taken into consideration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this is just duplicating code from below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly which code you're referring to, but if I do not catch all nanoseconds contributions when a Timedelta instance is prepared using any of the "microseconds", "milliseconds" and "seconds" kwargs or any combination therefrom, the potential nanosecond contributions from every of those kwargs must be caught here, otherwise it will be lost information, e.g. Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1).

@@ -1520,6 +1529,10 @@ class Timedelta(_Timedelta):
div = other // self
return div, other - div * self

# GH40946
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string and types pls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done on premise. (Depends on the decision to separate the issues in two PRs or not)

@@ -1520,6 +1529,10 @@ class Timedelta(_Timedelta):
div = other // self
return div, other - div * self

# GH40946
def total_seconds(self):
return self.value / 1e9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internally, ns precision is achieved by storing the value as a 64 bit integer.

once this is converted to a float, we cannot maintain ns precision. However, including the nanosecond component is probably more accurate than ignoring it.

I think we need more discussion on the issue before re-instating the nanosecond "precision".

from #31380 (comment)

... from "dubiously nanosecond-precision" to "microsecond-precision" ...

The change to ms precision was an intentional one but was not documented.

Changing it back would probably require a specific mention in the release note.

Also the tests are failing for the cases added when the precision of total_seconds changed, so this needs to be fixed also.

I also think that the 2 issue should be addressed independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I go and refactor/enhance the tests, it would be nice to have a categorical decision on this topic. Should the total_seconds part be a separate PR? I am not really sure why the nanosecond precision is a problem. The precise epoch in ns can be obtained by the value property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the total_seconds part be a separate PR?

the two fixes are orthogonal? The override of the total_seconds method to fix #40946 and the change to __new__ to fix the constructor issue #43764 (comment)?

If so, I think should be two separate PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls split to a sepearate PR

@@ -236,6 +236,8 @@ Other enhancements
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`)
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`).
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`)
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue:`43764`)

@@ -236,6 +236,8 @@ Other enhancements
- :meth:`is_list_like` now identifies duck-arrays as list-like unless ``.ndim == 0`` (:issue:`35131`)
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`).
- Add support for `Zstandard <http://facebook.github.io/zstd/>`_ compression to :meth:`DataFrame.to_pickle`/:meth:`read_pickle` and friends (:issue:`43925`)
- :class:`Timedelta` now properly taking into account any nanoseconds contribution (:issue: `43764`)
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue: `40946`)
- :meth:`Timedelta.total_seconds()` now properly taking into account any nanoseconds contribution (:issue:`40946`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type labels Dec 29, 2021
@simonjayhawkins simonjayhawkins added this to the 1.4 milestone Dec 29, 2021
@jreback jreback removed this from the 1.4 milestone Dec 29, 2021
@jreback
Copy link
Contributor

jreback commented Dec 29, 2021

@simonjayhawkins no more PRs on 1.4 actively trying to reduce the number. we can certainly decide later to backport or merge on the release.

@simonjayhawkins
Copy link
Member

sure. we had the discussion on the last release about the milestone and adopted the blocker for rc label. Admittedly, I don't recall if that allowed adding new issues/prs to the milestone.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2021

sure. we had the discussion on the last release about the milestone and adopted the blocker for rc label. Admittedly, I don't recall if that allowed adding new issues/prs to the milestone.

right but the point is i don't want to label these with a version until we are ready to merge (otherwise these tend to stick for no good reason)

nano = convert_to_timedelta64(kwargs.pop('nanoseconds', 0), 'ns')
# GH43764, making sure any nanoseconds contributions from any kwarg
# is taken into consideration
nano = convert_to_timedelta64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are entirely duplicating L1283

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My modifications removed "microseconds" till "seconds" from kwargs. Any other potential kwarg, such as year etc, are handled the same as the previous way, namely via L1283. Do you have other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually don't have a problem with doing what you are doing but then you need to entirely remove L1282-1284 and handle all the kwargs

(well you still need the try/except).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will also slow this function down a lot. rather than poping be explicit on the argument handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the last changes? I profiled instantiation time and it is taking about 50% the time as in master.

def test():
	pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)
	pd.Timedelta(minutes=-7)
	pd.Timedelta(seconds=1234e-9)
	pd.Timedelta(seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)
	pd.Timedelta(days=1, seconds=1e-9, milliseconds=1e-5, microseconds=1e-1)

+ kwargs.pop('microseconds', 0) * 1000
+ kwargs.pop('milliseconds', 0) * 1000000
+ kwargs.pop('seconds', 0) * 1000000000
), 'ns'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because these are floats, I assume that this will lose precision near the limits. i.e. seconds = 106751 (days) * 24*3600 with a nanosecond component specified too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure, but I would expect that when aiming at precision the inputs should instead be coded as integers by the caller? My motivation here is only to make the instantiation consistent in that, if float inputs are allowed, then any nanosecond contributions from any supplied kwargs must be taken into consideration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, if any of these components is a float.

so pd.Timedelta(seconds=float(106751 * 24 * 3600), nanoseconds=1)

gives Timedelta('106751 days 00:00:00') whereas on master we get Timedelta('106751 days 00:00:00.000000001')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check my last commit? It produces the same output as in master.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add a whatsnew in 1.4. bug fixes in the datetime section

raise ValueError(
"cannot construct a Timedelta from the passed arguments, "
"allowed keywords are "
"[weeks, days, hours, minutes, seconds, "
"milliseconds, microseconds, nanoseconds]"
)

# GH43764, making sure any nanoseconds contributions from any kwarg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a useful comment as its not relevant for a current reader. However a comment explaining what you are doing would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

) * 1_000_000_000
)

value = convert_to_timedelta64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just directly create a np.timedelta64(ts, "ns") no reason to go thru the routine when we know exactly what we have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Oversaw that. Performance is even better now.

DOC: moved the whatsnew entry to the timedelta section and added new comment
@jreback jreback added this to the 1.4 milestone Dec 31, 2021
@jreback jreback merged commit 8532faa into pandas-dev:master Dec 31, 2021
@jreback
Copy link
Contributor

jreback commented Dec 31, 2021

thanks @deponovo

value = nano + convert_to_timedelta64(timedelta(**kwargs),
'ns')
except TypeError as e:
if not cls._req_any_kwargs_new.intersection(kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change means we now fail to raise on pd.Timedelta(days=2, foo=9)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to create a PR for this..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Summing nanoseconds time delta to timestamp
4 participants