Skip to content

Commit 786edc7

Browse files
committed
ENH: add time-window capability to .rolling
xref #13327 closes #936 Author: Jeff Reback <[email protected]> Closes #13513 from jreback/rolling and squashes the following commits: d8f3d73 [Jeff Reback] ENH: add time-window capability to .rolling
1 parent 8acfad3 commit 786edc7

File tree

9 files changed

+2349
-687
lines changed

9 files changed

+2349
-687
lines changed

ci/lint.sh

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,19 @@ if [ "$LINT" ]; then
1717
fi
1818

1919
done
20-
echo "Linting DONE"
20+
echo "Linting *.py DONE"
21+
22+
echo "Linting *.pyx"
23+
for path in 'window.pyx'
24+
do
25+
echo "linting -> pandas/$path"
26+
flake8 pandas/$path --filename '*.pyx' --select=E501,E302,E203,E226,E111,E114,E221,E303,E128,E231,E126,E128
27+
if [ $? -ne "0" ]; then
28+
RET=1
29+
fi
30+
31+
done
32+
echo "Linting *.pyx DONE"
2133

2234
echo "Check for invalid testing"
2335
grep -r -E --include '*.py' --exclude nosetester.py --exclude testing.py '(numpy|np)\.testing' pandas

doc/source/computation.rst

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,6 +391,91 @@ For some windowing functions, additional parameters must be specified:
391391
such that the weights are normalized with respect to each other. Weights
392392
of ``[1, 1, 1]`` and ``[2, 2, 2]`` yield the same result.
393393

394+
.. _stats.moments.ts:
395+
396+
Time-aware Rolling
397+
~~~~~~~~~~~~~~~~~~
398+
399+
.. versionadded:: 0.19.0
400+
401+
New in version 0.19.0 are the ability to pass an offset (or convertible) to a ``.rolling()`` method and have it produce
402+
variable sized windows based on the passed time window. For each time point, this includes all preceding values occurring
403+
within the indicated time delta.
404+
405+
This can be particularly useful for a non-regular time frequency index.
406+
407+
.. ipython:: python
408+
409+
dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
410+
index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))
411+
dft
412+
413+
This is a regular frequency index. Using an integer window parameter works to roll along the window frequency.
414+
415+
.. ipython:: python
416+
417+
dft.rolling(2).sum()
418+
dft.rolling(2, min_periods=1).sum()
419+
420+
Specifying an offset allows a more intuitive specification of the rolling frequency.
421+
422+
.. ipython:: python
423+
424+
dft.rolling('2s').sum()
425+
426+
Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation.
427+
428+
429+
.. ipython:: python
430+
431+
432+
dft = DataFrame({'B': [0, 1, 2, np.nan, 4]},
433+
index = pd.Index([pd.Timestamp('20130101 09:00:00'),
434+
pd.Timestamp('20130101 09:00:02'),
435+
pd.Timestamp('20130101 09:00:03'),
436+
pd.Timestamp('20130101 09:00:05'),
437+
pd.Timestamp('20130101 09:00:06')],
438+
name='foo'))
439+
440+
dft
441+
dft.rolling(2).sum()
442+
443+
444+
Using the time-specification generates variable windows for this sparse data.
445+
446+
.. ipython:: python
447+
448+
dft.rolling('2s').sum()
449+
450+
Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the
451+
default of the index) in a DataFrame.
452+
453+
.. ipython:: python
454+
455+
dft = dft.reset_index()
456+
dft
457+
dft.rolling('2s', on='foo').sum()
458+
459+
.. _stats.moments.ts-versus-resampling:
460+
461+
Time-aware Rolling vs. Resampling
462+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
463+
464+
Using ``.rolling()`` with a time-based index is quite similar to :ref:`resampling <timeseries.resampling>`. They
465+
both operate and perform reductive operations on time-indexed pandas objects.
466+
467+
When using ``.rolling()`` with an offset. The offset is a time-delta. Take a backwards-in-time looking window, and
468+
aggregate all of the values in that window (including the end-point, but not the start-point). This is the new value
469+
at that point in the result. These are variable sized windows in time-space for each point of the input. You will get
470+
a same sized result as the input.
471+
472+
When using ``.resample()`` with an offset. Construct a new index that is the frequency of the offset. For each frequency
473+
bin, aggregate points from the input within a backwards-in-time looking window that fall in that bin. The result of this
474+
aggregation is the output for that frequency point. The windows are fixed size size in the frequency space. Your result
475+
will have the shape of a regular frequency between the min and the max of the original input object.
476+
477+
To summarize, ``.rolling()`` is a time-based window operation, while ``.resample()`` is a frequency-based window operation.
478+
394479
Centering Windows
395480
~~~~~~~~~~~~~~~~~
396481

doc/source/timeseries.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1284,7 +1284,11 @@ performing resampling operations during frequency conversion (e.g., converting
12841284
secondly data into 5-minutely data). This is extremely common in, but not
12851285
limited to, financial applications.
12861286

1287-
``resample`` is a time-based groupby, followed by a reduction method on each of its groups.
1287+
``.resample()`` is a time-based groupby, followed by a reduction method on each of its groups.
1288+
1289+
.. note::
1290+
1291+
``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion `here <stats.moments.ts-versus-resampling>`
12881292

12891293
See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategies
12901294

doc/source/whatsnew/v0.19.0.txt

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,17 @@
33
v0.19.0 (August ??, 2016)
44
-------------------------
55

6-
This is a major release from 0.18.2 and includes a small number of API changes, several new features,
6+
This is a major release from 0.18.1 and includes a small number of API changes, several new features,
77
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
88
users upgrade to this version.
99

1010
Highlights include:
1111

1212
- :func:`merge_asof` for asof-style time-series joining, see :ref:`here <whatsnew_0190.enhancements.asof_merge>`
13+
- ``.rolling()`` are now time-series aware, see :ref:`here <whatsnew_0190.enhancements.rolling_ts>`
1314
- pandas development api, see :ref:`here <whatsnew_0190.dev_api>`
1415

15-
.. contents:: What's new in v0.18.2
16+
.. contents:: What's new in v0.19.0
1617
:local:
1718
:backlinks: none
1819

@@ -131,6 +132,64 @@ that forward filling happens automatically taking the most recent non-NaN value.
131132
This returns a merged DataFrame with the entries in the same order as the original left
132133
passed DataFrame (``trades`` in this case), with the fields of the ``quotes`` merged.
133134

135+
.. _whatsnew_0190.enhancements.rolling_ts:
136+
137+
``.rolling()`` are now time-series aware
138+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139+
140+
``.rolling()`` objects are now time-series aware and can accept a time-series offset (or convertible) for the ``window`` argument (:issue:`13327`, :issue:`12995`)
141+
See the full documentation :ref:`here <stats.moments.ts>`.
142+
143+
.. ipython:: python
144+
145+
dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
146+
index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))
147+
dft
148+
149+
This is a regular frequency index. Using an integer window parameter works to roll along the window frequency.
150+
151+
.. ipython:: python
152+
153+
dft.rolling(2).sum()
154+
dft.rolling(2, min_periods=1).sum()
155+
156+
Specifying an offset allows a more intuitive specification of the rolling frequency.
157+
158+
.. ipython:: python
159+
160+
dft.rolling('2s').sum()
161+
162+
Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation.
163+
164+
.. ipython:: python
165+
166+
167+
dft = DataFrame({'B': [0, 1, 2, np.nan, 4]},
168+
index = pd.Index([pd.Timestamp('20130101 09:00:00'),
169+
pd.Timestamp('20130101 09:00:02'),
170+
pd.Timestamp('20130101 09:00:03'),
171+
pd.Timestamp('20130101 09:00:05'),
172+
pd.Timestamp('20130101 09:00:06')],
173+
name='foo'))
174+
175+
dft
176+
dft.rolling(2).sum()
177+
178+
Using the time-specification generates variable windows for this sparse data.
179+
180+
.. ipython:: python
181+
182+
dft.rolling('2s').sum()
183+
184+
Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the
185+
default of the index) in a DataFrame.
186+
187+
.. ipython:: python
188+
189+
dft = dft.reset_index()
190+
dft
191+
dft.rolling('2s', on='foo').sum()
192+
134193
.. _whatsnew_0190.enhancements.read_csv_dupe_col_names_support:
135194

136195
:func:`read_csv` has improved support for duplicate column names

pandas/core/generic.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5342,11 +5342,12 @@ def _add_series_or_dataframe_operations(cls):
53425342

53435343
@Appender(rwindow.rolling.__doc__)
53445344
def rolling(self, window, min_periods=None, freq=None, center=False,
5345-
win_type=None, axis=0):
5345+
win_type=None, on=None, axis=0):
53465346
axis = self._get_axis_number(axis)
53475347
return rwindow.rolling(self, window=window,
53485348
min_periods=min_periods, freq=freq,
5349-
center=center, win_type=win_type, axis=axis)
5349+
center=center, win_type=win_type,
5350+
on=on, axis=axis)
53505351

53515352
cls.rolling = rolling
53525353

0 commit comments

Comments
 (0)