-
Notifications
You must be signed in to change notification settings - Fork 0
Index Rounding? Error since audinterface 1.0.0 #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting, we will try to find out what's going on. As a temporary fix you can use ...
df = interface.process_index(df.index, preserve_index=True)
print(df)
|
Ok, it's actually an interesting issue. The reason we see a difference between the versions is that pre 1.0.0 we kept the end time from the index and now we overwrite it with the duration we calculate from the number of samples that are processed. Theoretically these values should match of course. Maybe it's because we use the |
Or maybe not :) One advantage of the current implementation is that it returns the correct time if end is out-of-bounds, e.g.: file = '/media/jwagner/Data/audb/emodb/1.3.0/d3b62a9b/wav/16b10Wb.wav'
interface.process_file(file, end='999999s') With pre 1.0.0 it returns:
But with 1.0.0:
So I would argue we should keep the new behavior and encourage the user to use @hagenw opinion? |
I also think that the current behavior makes sense. But as an intermediate step we should try to find out at which place exactly we are getting rounding errors. Maybe there is a way to avoid those. |
Can it be related to setting |
No, |
|
Ok, I think I have found the guilty one: dur = 2.5225
pd.to_timedelta(dur, 's').total_seconds()
|
There is a workaround proposed in pandas-dev/pandas#46819 >>> pd.to_timedelta(dur, 's') / pd.Timedelta(seconds=1)
2.522499999 I guess to achieve the exact same output we need to use less than nano-second precision: >>> round(pd.to_timedelta(dur, 's') / pd.Timedelta(seconds=1), ndigits=8)
2.5225 |
Ah nice. I guess we need to apply it in several spots, though:
Possibly more... |
I don't think we can handle this already when doing the >>> pd.to_timedelta(round(dur, ndigits=8), 's')
Timedelta('0 days 00:00:02.522499999') Looks like we can only do it when converting back to seconds. |
Would it be possible to set preserve_index=True by default? |
We cannot easily do that, since so far we always return a segmented index by default. But with |
The following workaround seems to work: >>> pd.to_timedelta(dur * 10 ** 9, 'ns')
Timedelta('0 days 00:00:02.522500')
>>> pd.to_timedelta(dur * 10 ** 9, 'ns').total_seconds()
2.5225 |
When running interface on a Multi index, the timestamps are sometimes rounded.
So e.g. instead of the initial index "0 days 0 days 00:00:01.877812" audinterface returns a dataframe with the index "0 days 00:00:01.877812500"
This behavior occurs only since version 1.0.0, the previous version 0.10.2 works fine.
Outputs (for audinterface==1.0.0 and 1.0.1):
Outputs (for audinterface==0.10.2):
Python 3.8 all packages:
audb 1.4.2
audbackend 0.3.18
audeer 1.19.0
audfactory 1.0.12
audformat 0.16.1
audinterface 1.0.1
audiofile 1.2.1
audmath 1.2.1
audobject 0.7.9
audresample 1.2.1
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 3.1.0
dohq-artifactory 0.8.4
filelock 3.10.7
idna 3.4
importlib-metadata 6.1.0
iso-639 0.4.5
iso3166 2.1.1
numpy 1.24.2
oyaml 1.0
pandas 2.0.0
pip 20.0.2
pkg-resources 0.0.0
pycparser 2.21
PyJWT 2.6.0
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
requests 2.28.2
setuptools 44.0.0
six 1.16.0
soundfile 0.12.1
tqdm 4.65.0
tzdata 2023.3
urllib3 1.26.15
zipp 3.15.0
The text was updated successfully, but these errors were encountered: