Numerically unstable mean calculation for Timedeltas. #9670

musically-ut · 2015-03-17T09:35:21Z

I am not sure whether I should report this here or on numpy. But this is what lead me to the problem:

 In [11]: dAllTags.describe()
Out [11]:
                     finalPeriod
count                      74501
mean    -1 days +02:40:08.792662
std     500 days 06:32:37.640848
min       2 days 00:51:49.730000
25%     498 days 19:11:28.576000
50%     846 days 00:46:56.656000
75%    1245 days 17:11:58.493000
max    2224 days 07:03:26.593000

All the values are positive (the minimum is 2 days) but the mean calculated is negative. This happens because the underlying type of np.timedelta64 is int64 which overflows while calculating the mean.

Now the issue of numerical stability in numpy has had a long history:

And though some steps have been taken to introduce precision accuracy (e.g. by providing fsum and using pairwise summation), there doesn't seem to be a consensus for using a numerically stable method for mean.

I was wondering if something could be done on the Pandas level to resolve this issue.

Currently, I am working around the issue by using the rather elaborate scheme:

df.finalPeriod.view(int).astype(float).mean()

since timedelta64 cannot be directly converted to float64. Is there a better/more intuitive way to do this?

The text was updated successfully, but these errors were encountered:

jreback · 2015-03-17T10:25:24Z

this is a dupe of #9442

pull-requests are welcome. This just needs to be addressed in core/nanops.py by adjusting the precision of sum (which is the basis of most of the other ops).

jreback closed this as completed Mar 17, 2015

jreback added Bug Timedelta Timedelta data type Duplicate Report Duplicate issue or pull request labels Mar 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerically unstable mean calculation for Timedeltas. #9670

Numerically unstable mean calculation for Timedeltas. #9670

musically-ut commented Mar 17, 2015

jreback commented Mar 17, 2015

Numerically unstable mean calculation for Timedeltas. #9670

Numerically unstable mean calculation for Timedeltas. #9670

Comments

musically-ut commented Mar 17, 2015

jreback commented Mar 17, 2015