-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: resample(..., base='start') for automaticly determining base. #8521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that this is not the same as issue #8371 . Resample seems to work fine with any frequency but a multiple of 8. |
if your first date were Since the other intervals 'happen' to evenly divide your first date you are not noticing this. their are several options to 'fix' this, e.g. However, not sure that this is possible with the current option set to do automatically. You almost want an option to |
Thanks. Base='start' does look like a good idea. Are you assuming that the new frequency is a multiple of the old one? And currently does pandas have a way to check that the index of a Series is, a concatenated sequence of, say 1 minute dateranges all starting from 9:30 on each day (but not necessarily ending at 16:00 for the case of holidays)? Currently, in order to create 7 minute bars from 1 minute bars, I would have to either create my own concatenated sequence of sub-dateranges, or I could group by day and use resample + between_time. |
I will think about this for 0.15.1. But I think if you check Their are various ways do this this, most notably:
|
That works! Thanks. |
@dimbeto great! |
#31809 should also help to fix this issue: import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
df.resample("8T", origin="start").first() Outputs:
EDIT: use |
(edit):
The first item is at 9:30, which is not divisible by 8 minutes. We'd like
df.resample('8T', base='start').first()
to be equivalent todf.resample('8T', base=2)
.It seems that for 1Min bar data, resample() with sampling frequency of any multiple of 8 has a bug. The code below illustrates the bug when resampling is done at [3, 5, 6, 8, 16] Min. For both 3 and 5 frequency, the first entry of the resampled dataframe index starts at the base timestamp (9:30 in this case) while for frequencies 8 and 16, the resampled index starts at 9:26 and 9:18 respectively.
produces the following output:
The text was updated successfully, but these errors were encountered: