Index Rounding? Error since audinterface 1.0.0 #113

schruefer · 2023-04-06T13:10:20Z

When running interface on a Multi index, the timestamps are sometimes rounded.

So e.g. instead of the initial index "0 days 0 days 00:00:01.877812" audinterface returns a dataframe with the index "0 days 00:00:01.877812500"

This behavior occurs only since version 1.0.0, the previous version 0.10.2 works fine.


import audb
import os
import audinterface

media = [
    'wav/03a01Fa.wav',
    'wav/03a01Nc.wav',
    'wav/16b10Wb.wav',
    'wav/03a01Wa.wav'
]
db = audb.load(
    'emodb',
    version='1.3.0',
    media=media,
    verbose=False,
)

files = list(db.files)
folder = os.path.dirname(files[0])
df = db['emotion'].get(as_segmented = True, allow_nat=False)
print(df)

def features(signal, sampling_rate):
    return [signal.mean(), signal.std()]

interface = audinterface.Feature(
    ['mean', 'std'],
    process_func=features,
)
df = interface.process_index(df.index)
print(df)

Outputs (for audinterface==1.0.0 and 1.0.1):


                                                                                emotion  emotion.confidence
file                                            start  end                                                  
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250  happiness                0.90
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250    neutral                1.00
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812      anger                0.95
/data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav 0 days 0 days 00:00:02.522499      anger                1.00
                                                                                      mean       std
file                                            start  end                                          
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250    -0.000311  0.082317
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250    -0.000312  0.125304
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812500 -0.000296  0.127394
/data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav 0 days 0 days 00:00:02.522499999 -0.000464  0.095558

Outputs (for audinterface==0.10.2):

                                                                                 emotion  emotion.confidence
file                                            start  end                                                  
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250  happiness                0.90
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250    neutral                1.00
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812      anger                0.95
/data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav 0 days 0 days 00:00:02.522499      anger                1.00
                                                                                   mean       std
file                                            start  end                                       
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Fa.wav 0 days 0 days 00:00:01.898250 -0.000311  0.082317
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Nc.wav 0 days 0 days 00:00:01.611250 -0.000312  0.125304
/data/audb/emodb/1.3.0/d3b62a9b/wav/03a01Wa.wav 0 days 0 days 00:00:01.877812 -0.000296  0.127394
/data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav 0 days 0 days 00:00:02.522499 -0.000464  0.095558

Python 3.8 all packages:

audb 1.4.2
audbackend 0.3.18
audeer 1.19.0
audfactory 1.0.12
audformat 0.16.1
audinterface 1.0.1
audiofile 1.2.1
audmath 1.2.1
audobject 0.7.9
audresample 1.2.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
dohq-artifactory 0.8.4
filelock 3.10.7
idna 3.4
importlib-metadata 6.1.0
iso-639 0.4.5
iso3166 2.1.1
numpy 1.24.2
oyaml 1.0
pandas 2.0.0
pip 20.0.2
pkg-resources 0.0.0
pycparser 2.21
PyJWT 2.6.0
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
requests 2.28.2
setuptools 44.0.0
six 1.16.0
soundfile 0.12.1
tqdm 4.65.0
tzdata 2023.3
urllib3 1.26.15
zipp 3.15.0

The text was updated successfully, but these errors were encountered:

frankenjoe · 2023-04-07T08:22:25Z

Thanks for reporting, we will try to find out what's going on. As a temporary fix you can use preserve_index=True:

...
df = interface.process_index(df.index, preserve_index=True)
print(df)

file                                              start  end                                       
/media/jwagner/Data/audb/emodb/1.3.0/d3b62a9b/... 0 days 0 days 00:00:01.898250 -0.000311  0.082317
                                                         0 days 00:00:01.611250 -0.000312  0.125304
                                                         0 days 00:00:01.877812 -0.000296  0.127394
                                                         0 days 00:00:02.522499 -0.000464  0.095558

frankenjoe · 2023-04-11T07:31:07Z

Ok, it's actually an interesting issue. The reason we see a difference between the versions is that pre 1.0.0 we kept the end time from the index and now we overwrite it with the duration we calculate from the number of samples that are processed. Theoretically these values should match of course. Maybe it's because we use the sloppy=True when we calculate the duration in audb or it's some rounding issue when the duration is stored to CSV as part of the dependency table. In any case, the behavior is not nice and we should make sure that we keep the end value from the index.

frankenjoe · 2023-04-11T07:48:32Z

Or maybe not :)

One advantage of the current implementation is that it returns the correct time if end is out-of-bounds, e.g.:

file = '/media/jwagner/Data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav'
interface.process_file(file, end='999999s')

With pre 1.0.0 it returns:

                                                                               mean       std
file                                              start  end                                 
/media/jwagner/Data/audb/emodb/1.3.0/d3b62a9b/... 0 days 11 days 13:46:39 -0.000464  0.095558

But with 1.0.0:

                                                                                        mean       std
file                                              start  end                                          
/media/jwagner/Data/audb/emodb/1.3.0/d3b62a9b/... 0 days 0 days 00:00:02.522499999 -0.000464  0.095558

So I would argue we should keep the new behavior and encourage the user to use preserve_index=True if the index must not change.

@hagenw opinion?

hagenw · 2023-04-11T08:38:27Z

I also think that the current behavior makes sense.

But as an intermediate step we should try to find out at which place exactly we are getting rounding errors. Maybe there is a way to avoid those.

frankenjoe · 2023-04-11T08:45:39Z

But as an intermediate step we should try to find out at which place exactly we are getting rounding errors. Maybe there is a way to avoid those.

Can it be related to setting sloopy=True when we read the file duration in audb.publish()? Even if we work with WAV files?

hagenw · 2023-04-11T08:48:01Z

No, sloppy is not applied to WAV files: https://github.com/audeering/audiofile/blob/0ae2de5ac552a2982417e7cfde0d9b39322ef7c4/audiofile/core/info.py#L161-L165

hagenw · 2023-04-11T08:49:55Z

soundfile.info(file).duration most likely reads the duration from the header. I don't know if there is a way you can create WAV files that have a duration in the header that does not match the number of samples. But different libraries might round 0.5 differently.

frankenjoe · 2023-04-11T08:55:12Z

Ok, I think I have found the guilty one:

dur = 2.5225
pd.to_timedelta(dur, 's').total_seconds()

2.522499

hagenw · 2023-04-11T10:21:36Z

There is a workaround proposed in pandas-dev/pandas#46819

>>> pd.to_timedelta(dur, 's') / pd.Timedelta(seconds=1)
2.522499999

I guess to achieve the exact same output we need to use less than nano-second precision:

>>> round(pd.to_timedelta(dur, 's') / pd.Timedelta(seconds=1), ndigits=8)
2.5225

frankenjoe · 2023-04-11T10:27:15Z

Ah nice. I guess we need to apply it in several spots, though:

Possibly more...

hagenw · 2023-04-11T10:32:48Z

I don't think we can handle this already when doing the pd.to_timedelta(dur, unit='s') conversion, e.g.

>>> pd.to_timedelta(round(dur, ndigits=8), 's')
Timedelta('0 days 00:00:02.522499999')

Looks like we can only do it when converting back to seconds.
Or as an alternative we could check if there is a way to avoid converting to timedelta in the first place.

schruefer · 2023-04-11T11:02:12Z

So I would argue we should keep the new behavior and encourage the user to use preserve_index=True if the index must not change.

Would it be possible to set preserve_index=True by default?
I would assume that the majority of people using process_index would like to keep the index.

frankenjoe · 2023-04-11T11:20:18Z

We cannot easily do that, since so far we always return a segmented index by default. But with preserve_index=True it can happen that the result is a filewise index (if also the input is a filewise index).

hagenw · 2023-04-12T06:36:59Z

The following workaround seems to work:

>>> pd.to_timedelta(dur * 10 ** 9, 'ns')
Timedelta('0 days 00:00:02.522500')
>>> pd.to_timedelta(dur * 10 ** 9, 'ns').total_seconds()
2.5225

hagenw added the bug Something isn't working label Apr 6, 2023

frankenjoe closed this as completed in #115 May 8, 2023

hagenw mentioned this issue Jul 28, 2023

Fix precision of audinterface.utils.to_timedelta() #137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Index Rounding? Error since audinterface 1.0.0 #113

Index Rounding? Error since audinterface 1.0.0 #113

schruefer commented Apr 6, 2023

frankenjoe commented Apr 7, 2023

Uh oh!

frankenjoe commented Apr 11, 2023 •

edited

Loading

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023 •

edited

Loading

Uh oh!

hagenw commented Apr 11, 2023 •

edited

Loading

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

schruefer commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 12, 2023

Uh oh!

Index Rounding? Error since audinterface 1.0.0 #113

Index Rounding? Error since audinterface 1.0.0 #113

Comments

schruefer commented Apr 6, 2023

frankenjoe commented Apr 7, 2023

Uh oh!

frankenjoe commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hagenw commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 11, 2023

Uh oh!

schruefer commented Apr 11, 2023

Uh oh!

frankenjoe commented Apr 11, 2023

Uh oh!

hagenw commented Apr 12, 2023

Uh oh!

frankenjoe commented Apr 11, 2023 •

edited

Loading

frankenjoe commented Apr 11, 2023 •

edited

Loading

hagenw commented Apr 11, 2023 •

edited

Loading