Skip to content

BUG: Timedelta.total_seconds method is returning wrong values in nanosecond intervals #46819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
Tiarles opened this issue Apr 21, 2022 · 7 comments
Open
3 tasks done
Labels
Bug Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type

Comments

@Tiarles
Copy link

Tiarles commented Apr 21, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

duration, Ts = 0.5, 5e-7
signal = pd.to_timedelta(np.arange(0, duration, Ts), "s")

dt = signal[1] - signal[0]

print(signal[1])  # 0 days 00:00:00.000000500
print(signal[0])  # 0 days 00:00:00

print(dt.total_seconds())  # 0.0 <- Wrong output

print(dt / pd.Timedelta(seconds=1))  # 5e-07 <- Expected behavior of ``dt.total_seconds()``

duration, Ts = 0.5, 1e-4
signal = pd.to_timedelta(np.arange(0, duration, Ts), "s")

dt = signal[1] - signal[0]

print(signal[1])  # 0 days 00:00:00.000100
print(signal[0])  # 0 days 00:00:00

print(dt.total_seconds())  # 0.0001 <- Correct answer

print(dt / pd.Timedelta(seconds=1))  # 0.0001 <- Proof

Issue Description

The total_seconds method from the Timedelta class has an unexpected behavior on Pandas 1.4.2, since Pandas 1.0.5.
For a small difference in nanoseconds scale the number of seconds on the interval returns 0,

Expected Behavior

The expected behavior is like what happens in Pandas 1.0.5, where:

import pandas as pd
import numpy as np

duration, Ts = 0.5, 5e-7
signal = pd.to_timedelta(np.arange(0, duration, Ts), "s")

dt = signal[1] - signal[0]

print(dt.total_seconds())  # = 5e-07

Installed Versions

Python 3.8.4

Env w/ Pandas 1.4.2:

Package Version
numpy 1.22.3
pandas 1.4.2
pip 22.0.4
python-dateutil 2.8.2
pytz 2022.1
setuptools 60.10.0
six 1.16.0
wheel 0.37.1

Env w/ Pandas 1.0.5:

Package Version
numpy 1.22.3
pandas 1.0.5
pip 22.0.4
python-dateutil 2.8.2
pytz 2022.1
setuptools 60.10.0
six 1.16.0
wheel 0.37.1
@Tiarles Tiarles added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 21, 2022
@Tiarles
Copy link
Author

Tiarles commented Apr 26, 2022

It`s an issue related to datetime package and not pandas itself. Closing

@Tiarles Tiarles closed this as completed Apr 26, 2022
@Tiarles
Copy link
Author

Tiarles commented Apr 27, 2022

Seems that datetime.timedelta ignores nanoseconds operations that pandas should support, so the total_seconds method of pandas.core.indexes.timedeltas.TimedeltaIndex doesn't works as expected.

@Tiarles Tiarles reopened this Apr 27, 2022
@simonjayhawkins simonjayhawkins added Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 28, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 28, 2022
@simonjayhawkins
Copy link
Member

Thanks @Tiarles for the report.

The total_seconds method from the Timedelta class has an unexpected behavior on Pandas 1.4.2, since Pandas 1.0.5.

first bad commit: [3b2c8f6] BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone (#31155)

see #31155 (comment) and #31354

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label May 29, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone May 29, 2022
@CloseChoice
Copy link
Member

CloseChoice commented Jun 29, 2022

I don't see what we can do here, since we are relying on the CPython timedelta implementation which doesn't know nanoseconds.
The problem arises at these here:

cdef _timedelta_from_value_and_reso(int64_t value, NPY_DATETIMEUNIT reso):
    # Could make this a classmethod if/when cython supports cdef classmethods
    cdef:
        _Timedelta td_base

    if reso == NPY_FR_ns:
        td_base = _Timedelta.__new__(Timedelta, microseconds=int(value) // 1000)
    elif reso == NPY_DATETIMEUNIT.NPY_FR_us:
        td_base = _Timedelta.__new__(Timedelta, microseconds=int(value))
    elif reso == NPY_DATETIMEUNIT.NPY_FR_ms:
        td_base = _Timedelta.__new__(Timedelta, milliseconds=int(value))
    elif reso == NPY_DATETIMEUNIT.NPY_FR_s:
        td_base = _Timedelta.__new__(Timedelta, seconds=int(value))
    ...

timedelta inherits its __new__ method from CPython.timedelta which can't handle nanoseconds. Maybe trigger a warning if int(value) // 1000 == 0 and value != 0?

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@hagenw
Copy link

hagenw commented Apr 12, 2023

I'm not sure, but maybe this issue is related to this simple failing example:

>>> pd.to_timedelta(2.5225, 's').total_seconds()
2.522499

The proposed workaround with pd.Timedelta(seconds=1) does not work in this case:

>>> pd.to_timedelta(2.5225, 's') / pd.Timedelta(seconds=1)
2.522499999

What does is specifying the time in nanoseconds:

>>>  pd.to_timedelta(2.5225 * 10 ** 9, 'ns').total_seconds()
2.5225

@miccoli
Copy link
Contributor

miccoli commented Dec 26, 2023

I don't see what we can do here, since we are relying on the CPython timedelta implementation which doesn't know nanoseconds.

@CloseChoice Maybe just add to the docs that total_seconds() are always truncated to μs resolution?

The fact that float seconds since unix epoch progressively loose precision is well known, see https://docs.python.org/3/library/time.html#time.clock_gettime, but the fact that pandas total_seconds() -> float always truncates to μs resolution has a very high surprise factor!

I understand that converting 64 bit integers to float has its quirks (due to the very nature of IEEE floating point representation) but a clear statement of the expected behaviour should be given somewhere.

@miccoli
Copy link
Contributor

miccoli commented Dec 26, 2023

I'm not sure, but maybe this issue is related to this simple failing example:

@hagenw The “decimal” literal 2.5225 has no exact binary float representation: in fact we have

>>> decimal.Decimal(2.5225)
Decimal('2.52249999999999996447286321199499070644378662109375')

But 2.5225e9 as a float is exact:

>>> decimal.Decimal(2.5225e9)
Decimal('2522500000')

The consequence is that pd.to_timedelta(2.5225, 's') != pd.to_timedelta(2.5225 * 10 ** 9, 'ns')

A simple check:

>>> pd.to_timedelta(2.5225, 's').asm8
numpy.timedelta64(2522499999,'ns')
>>> pd.to_timedelta(2.5225e9, 'ns').asm8
numpy.timedelta64(2522500000,'ns')

The problem here seems to be that pd.to_timedelta(unit='s') does not round to the nearest ns (nanosecond) resolution, but truncates sub ns resolution. But this is not related to total_seconds which is the object of the present bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

6 participants