Skip to content

Commit cfd8210

Browse files
max-sixtyshoyer
authored andcommitted
rolling_exp (nee ewm) (#2650)
* WIP on ewm using numbagg * basic functionality, no dims working yet * rename to `rolling_exp` * ensure works on either dimensions * window_type working * add numbagg to travis install * naming * formatting * @shoyer's function to abstract the type of self.obj * initial docstring * add docstrings to docs * example * correct location for docs * add numbagg to print_versions * whatsnew * updating my GH username * pin to numbagg release * rename inner func to move_exp_nanmean * merge * typo * comments from PR * window -> alpha in numbagg * add docs * doc fix * whatsnew update * revert formatting changes to unchanged file * update docstrings, adjust kwarg names * mypy * flake * pytest config tiny tweak while I'm here * Rolling exp doc updates * remove _attributes from RollingExp class
1 parent 223a05f commit cfd8210

12 files changed

+244
-30
lines changed

ci/requirements-py37.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,4 @@ dependencies:
3030
- pydap
3131
- pip:
3232
- mypy==0.650
33+
- numbagg

doc/api.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ Computation
148148
Dataset.groupby
149149
Dataset.groupby_bins
150150
Dataset.rolling
151+
Dataset.rolling_exp
151152
Dataset.coarsen
152153
Dataset.resample
153154
Dataset.diff
@@ -315,6 +316,7 @@ Computation
315316
DataArray.groupby
316317
DataArray.groupby_bins
317318
DataArray.rolling
319+
DataArray.rolling_exp
318320
DataArray.coarsen
319321
DataArray.dt
320322
DataArray.resample
@@ -535,6 +537,7 @@ Rolling objects
535537
core.rolling.DatasetRolling
536538
core.rolling.DatasetRolling.construct
537539
core.rolling.DatasetRolling.reduce
540+
core.rolling_exp.RollingExp
538541

539542
Resample objects
540543
================

doc/computation.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,22 @@ We can also manually iterate through ``Rolling`` objects:
190190
for label, arr_window in r:
191191
# arr_window is a view of x
192192
193+
.. _comput.rolling_exp:
194+
195+
While ``rolling`` provides a simple moving average, ``DataArray`` also supports
196+
an exponential moving average with :py:meth:`~xarray.DataArray.rolling_exp`.
197+
This is similiar to pandas' ``ewm`` method. numbagg_ is required.
198+
199+
.. _numbagg: https://github.com/shoyer/numbagg
200+
201+
.. code:: python
202+
203+
arr.rolling_exp(y=3).mean()
204+
205+
The ``rolling_exp`` method takes a ``window_type`` kwarg, which can be ``'alpha'``,
206+
``'com'`` (for ``center-of-mass``), ``'span'``, and ``'halflife'``. The default is
207+
``span``.
208+
193209
Finally, the rolling object has a ``construct`` method which returns a
194210
view of the original ``DataArray`` with the windowed dimension in
195211
the last position.

doc/installing.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ For accelerating xarray
4545
- `bottleneck <https://github.com/kwgoodman/bottleneck>`__: speeds up
4646
NaN-skipping and rolling window aggregations by a large factor
4747
(1.1 or later)
48+
- `numbagg <https://github.com/shoyer/numbagg>`_: for exponential rolling
49+
window operations
4850

4951
For parallel computing
5052
~~~~~~~~~~~~~~~~~~~~~~

doc/whats-new.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ Enhancements
3232
- Add ``fill_value`` argument for reindex, align, and merge operations
3333
to enable custom fill values. (:issue:`2876`)
3434
By `Zach Griffith <https://github.com/zdgriffith>`_.
35+
- :py:meth:`~xarray.DataArray.rolling_exp` and
36+
:py:meth:`~xarray.Dataset.rolling_exp` added, similar to pandas'
37+
``pd.DataFrame.ewm`` method. Calling ``.mean`` on the resulting object
38+
will return an exponentially weighted moving average.
39+
By `Maximilian Roos <https://github.com/max-sixty>`_.
3540
- Character arrays' character dimension name decoding and encoding handled by
3641
``var.encoding['char_dim_name']`` (:issue:`2895`)
3742
By `James McCreight <https://github.com/jmccreight>`_.
@@ -188,6 +193,7 @@ Other enhancements
188193
- Upsampling an array via interpolation with resample is now dask-compatible,
189194
as long as the array is not chunked along the resampling dimension.
190195
By `Spencer Clark <https://github.com/spencerkclark>`_.
196+
191197
- :py:func:`xarray.testing.assert_equal` and
192198
:py:func:`xarray.testing.assert_identical` now provide a more detailed
193199
report showing what exactly differs between the two objects (dimensions /
@@ -737,20 +743,20 @@ Enhancements
737743
arguments in ``data_vars`` to indexes set explicitly in ``coords``,
738744
where previously an error would be raised.
739745
(:issue:`674`)
740-
By `Maximilian Roos <https://github.com/maxim-lian>`_.
746+
By `Maximilian Roos <https://github.com/max-sixty>`_.
741747

742748
- :py:meth:`~DataArray.sel`, :py:meth:`~DataArray.isel` & :py:meth:`~DataArray.reindex`,
743749
(and their :py:class:`Dataset` counterparts) now support supplying a ``dict``
744750
as a first argument, as an alternative to the existing approach
745751
of supplying `kwargs`. This allows for more robust behavior
746752
of dimension names which conflict with other keyword names, or are
747753
not strings.
748-
By `Maximilian Roos <https://github.com/maxim-lian>`_.
754+
By `Maximilian Roos <https://github.com/max-sixty>`_.
749755

750756
- :py:meth:`~DataArray.rename` now supports supplying ``**kwargs``, as an
751757
alternative to the existing approach of supplying a ``dict`` as the
752758
first argument.
753-
By `Maximilian Roos <https://github.com/maxim-lian>`_.
759+
By `Maximilian Roos <https://github.com/max-sixty>`_.
754760

755761
- :py:meth:`~DataArray.cumsum` and :py:meth:`~DataArray.cumprod` now support
756762
aggregation over multiple dimensions at the same time. This is the default
@@ -915,7 +921,7 @@ Enhancements
915921
which test each value in the array for whether it is contained in the
916922
supplied list, returning a bool array. See :ref:`selecting values with isin`
917923
for full details. Similar to the ``np.isin`` function.
918-
By `Maximilian Roos <https://github.com/maxim-lian>`_.
924+
By `Maximilian Roos <https://github.com/max-sixty>`_.
919925
- Some speed improvement to construct :py:class:`~xarray.DataArrayRolling`
920926
object (:issue:`1993`)
921927
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
@@ -2110,7 +2116,7 @@ Enhancements
21102116
~~~~~~~~~~~~
21112117

21122118
- New documentation on :ref:`panel transition`. By
2113-
`Maximilian Roos <https://github.com/maximilianr>`_.
2119+
`Maximilian Roos <https://github.com/max-sixty>`_.
21142120
- New ``Dataset`` and ``DataArray`` methods :py:meth:`~xarray.Dataset.to_dict`
21152121
and :py:meth:`~xarray.Dataset.from_dict` to allow easy conversion between
21162122
dictionaries and xarray objects (:issue:`432`). See
@@ -2131,9 +2137,9 @@ Bug fixes
21312137
(:issue:`953`). By `Stephan Hoyer <https://github.com/shoyer>`_.
21322138
- ``Dataset.__dir__()`` (i.e. the method python calls to get autocomplete
21332139
options) failed if one of the dataset's keys was not a string (:issue:`852`).
2134-
By `Maximilian Roos <https://github.com/maximilianr>`_.
2140+
By `Maximilian Roos <https://github.com/max-sixty>`_.
21352141
- ``Dataset`` constructor can now take arbitrary objects as values
2136-
(:issue:`647`). By `Maximilian Roos <https://github.com/maximilianr>`_.
2142+
(:issue:`647`). By `Maximilian Roos <https://github.com/max-sixty>`_.
21372143
- Clarified ``copy`` argument for :py:meth:`~xarray.DataArray.reindex` and
21382144
:py:func:`~xarray.align`, which now consistently always return new xarray
21392145
objects (:issue:`927`).

setup.cfg

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,11 @@ filterwarnings =
99
ignore:Using a non-tuple sequence for multidimensional indexing is deprecated:FutureWarning
1010
env =
1111
UVCDAT_ANONYMOUS_LOG=no
12+
markers =
13+
flaky: flaky tests
14+
network: tests requiring a network connection
15+
slow: slow tests
1216

13-
# This should be kept in sync with .pep8speaks.yml
1417
[flake8]
1518
max-line-length=79
1619
ignore=
@@ -23,10 +26,6 @@ ignore=
2326
F401
2427
exclude=
2528
doc
26-
markers =
27-
flaky: flaky tests
28-
network: tests requiring a network connection
29-
slow: slow tests
3029

3130
[isort]
3231
default_section=THIRDPARTY
@@ -62,6 +61,8 @@ ignore_missing_imports = True
6261
ignore_missing_imports = True
6362
[mypy-nc_time_axis.*]
6463
ignore_missing_imports = True
64+
[mypy-numbagg.*]
65+
ignore_missing_imports = True
6566
[mypy-numpy.*]
6667
ignore_missing_imports = True
6768
[mypy-netCDF4.*]

xarray/core/common.py

Lines changed: 49 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
from .arithmetic import SupportsArithmetic
1313
from .options import _get_keep_attrs
1414
from .pycompat import dask_array_type
15+
from .rolling_exp import RollingExp
1516
from .utils import Frozen, ReprObject, SortedKeysDict, either_dict_or_kwargs
1617

1718
# Used as a sentinel value to indicate a all dimensions
@@ -86,6 +87,7 @@ def wrapped_func(self, dim=None, **kwargs): # type: ignore
8687
class AbstractArray(ImplementsArrayReduce):
8788
"""Shared base class for DataArray and Variable.
8889
"""
90+
8991
def __bool__(self: Any) -> bool:
9092
return bool(self.values)
9193

@@ -249,6 +251,8 @@ def get_squeeze_dims(xarray_obj,
249251
class DataWithCoords(SupportsArithmetic, AttrAccessMixin):
250252
"""Shared base class for Dataset and DataArray."""
251253

254+
_rolling_exp_cls = RollingExp
255+
252256
def squeeze(self, dim: Union[Hashable, Iterable[Hashable], None] = None,
253257
drop: bool = False,
254258
axis: Union[int, Iterable[int], None] = None):
@@ -553,7 +557,7 @@ def groupby_bins(self, group, bins, right: bool = True, labels=None,
553557

554558
def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
555559
min_periods: Optional[int] = None, center: bool = False,
556-
**dim_kwargs: int):
560+
**window_kwargs: int):
557561
"""
558562
Rolling window object.
559563
@@ -568,9 +572,9 @@ def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
568572
setting min_periods equal to the size of the window.
569573
center : boolean, default False
570574
Set the labels at the center of the window.
571-
**dim_kwargs : optional
575+
**window_kwargs : optional
572576
The keyword arguments form of ``dim``.
573-
One of dim or dim_kwargs must be provided.
577+
One of dim or window_kwargs must be provided.
574578
575579
Returns
576580
-------
@@ -609,15 +613,54 @@ def rolling(self, dim: Optional[Mapping[Hashable, int]] = None,
609613
core.rolling.DataArrayRolling
610614
core.rolling.DatasetRolling
611615
""" # noqa
612-
dim = either_dict_or_kwargs(dim, dim_kwargs, 'rolling')
616+
dim = either_dict_or_kwargs(dim, window_kwargs, 'rolling')
613617
return self._rolling_cls(self, dim, min_periods=min_periods,
614618
center=center)
615619

620+
def rolling_exp(
621+
self,
622+
window: Optional[Mapping[Hashable, int]] = None,
623+
window_type: str = 'span',
624+
**window_kwargs
625+
):
626+
"""
627+
Exponentially-weighted moving window.
628+
Similar to EWM in pandas
629+
630+
Requires the optional Numbagg dependency.
631+
632+
Parameters
633+
----------
634+
window : A single mapping from a dimension name to window value,
635+
optional
636+
dim : str
637+
Name of the dimension to create the rolling exponential window
638+
along (e.g., `time`).
639+
window : int
640+
Size of the moving window. The type of this is specified in
641+
`window_type`
642+
window_type : str, one of ['span', 'com', 'halflife', 'alpha'],
643+
default 'span'
644+
The format of the previously supplied window. Each is a simple
645+
numerical transformation of the others. Described in detail:
646+
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html
647+
**window_kwargs : optional
648+
The keyword arguments form of ``window``.
649+
One of window or window_kwargs must be provided.
650+
651+
See Also
652+
--------
653+
core.rolling_exp.RollingExp
654+
"""
655+
window = either_dict_or_kwargs(window, window_kwargs, 'rolling_exp')
656+
657+
return self._rolling_exp_cls(self, window, window_type)
658+
616659
def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
617660
boundary: str = 'exact',
618661
side: Union[str, Mapping[Hashable, str]] = 'left',
619662
coord_func: str = 'mean',
620-
**dim_kwargs: int):
663+
**window_kwargs: int):
621664
"""
622665
Coarsen object.
623666
@@ -671,7 +714,7 @@ def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
671714
core.rolling.DataArrayCoarsen
672715
core.rolling.DatasetCoarsen
673716
"""
674-
dim = either_dict_or_kwargs(dim, dim_kwargs, 'coarsen')
717+
dim = either_dict_or_kwargs(dim, window_kwargs, 'coarsen')
675718
return self._coarsen_cls(
676719
self, dim, boundary=boundary, side=side,
677720
coord_func=coord_func)

xarray/core/rolling_exp.py

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
import numpy as np
2+
3+
from .pycompat import dask_array_type
4+
5+
6+
def _get_alpha(com=None, span=None, halflife=None, alpha=None):
7+
# pandas defines in terms of com (converting to alpha in the algo)
8+
# so use its function to get a com and then convert to alpha
9+
10+
com = _get_center_of_mass(com, span, halflife, alpha)
11+
return 1 / (1 + com)
12+
13+
14+
def move_exp_nanmean(array, *, axis, alpha):
15+
if isinstance(array, dask_array_type):
16+
raise TypeError("rolling_exp is not currently support for dask arrays")
17+
import numbagg
18+
if axis == ():
19+
return array.astype(np.float64)
20+
else:
21+
return numbagg.move_exp_nanmean(
22+
array, axis=axis, alpha=alpha)
23+
24+
25+
def _get_center_of_mass(comass, span, halflife, alpha):
26+
"""
27+
Vendored from pandas.core.window._get_center_of_mass
28+
29+
See licenses/PANDAS_LICENSE for the function's license
30+
"""
31+
from pandas.core import common as com
32+
valid_count = com.count_not_none(comass, span, halflife, alpha)
33+
if valid_count > 1:
34+
raise ValueError("comass, span, halflife, and alpha "
35+
"are mutually exclusive")
36+
37+
# Convert to center of mass; domain checks ensure 0 < alpha <= 1
38+
if comass is not None:
39+
if comass < 0:
40+
raise ValueError("comass must satisfy: comass >= 0")
41+
elif span is not None:
42+
if span < 1:
43+
raise ValueError("span must satisfy: span >= 1")
44+
comass = (span - 1) / 2.
45+
elif halflife is not None:
46+
if halflife <= 0:
47+
raise ValueError("halflife must satisfy: halflife > 0")
48+
decay = 1 - np.exp(np.log(0.5) / halflife)
49+
comass = 1 / decay - 1
50+
elif alpha is not None:
51+
if alpha <= 0 or alpha > 1:
52+
raise ValueError("alpha must satisfy: 0 < alpha <= 1")
53+
comass = (1.0 - alpha) / alpha
54+
else:
55+
raise ValueError("Must pass one of comass, span, halflife, or alpha")
56+
57+
return float(comass)
58+
59+
60+
class RollingExp:
61+
"""
62+
Exponentially-weighted moving window object.
63+
Similar to EWM in pandas
64+
65+
Parameters
66+
----------
67+
obj : Dataset or DataArray
68+
Object to window.
69+
windows : A single mapping from a single dimension name to window value
70+
dim : str
71+
Name of the dimension to create the rolling exponential window
72+
along (e.g., `time`).
73+
window : int
74+
Size of the moving window. The type of this is specified in
75+
`window_type`
76+
window_type : str, one of ['span', 'com', 'halflife', 'alpha'], default 'span'
77+
The format of the previously supplied window. Each is a simple
78+
numerical transformation of the others. Described in detail:
79+
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html
80+
81+
Returns
82+
-------
83+
RollingExp : type of input argument
84+
""" # noqa
85+
86+
def __init__(self, obj, windows, window_type='span'):
87+
self.obj = obj
88+
dim, window = next(iter(windows.items()))
89+
self.dim = dim
90+
self.alpha = _get_alpha(**{window_type: window})
91+
92+
def mean(self):
93+
"""
94+
Exponentially weighted moving average
95+
96+
Examples
97+
--------
98+
>>> da = xr.DataArray([1,1,2,2,2], dims='x')
99+
>>> da.rolling_exp(x=2, window_type='span').mean()
100+
<xarray.DataArray (x: 5)>
101+
array([1. , 1. , 1.692308, 1.9 , 1.966942])
102+
Dimensions without coordinates: x
103+
"""
104+
105+
return self.obj.reduce(
106+
move_exp_nanmean, dim=self.dim, alpha=self.alpha)

xarray/tests/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ def LooseVersion(vstring):
7474
has_np113, requires_np113 = _importorskip('numpy', minversion='1.13.0')
7575
has_iris, requires_iris = _importorskip('iris')
7676
has_cfgrib, requires_cfgrib = _importorskip('cfgrib')
77+
has_numbagg, requires_numbagg = _importorskip('numbagg')
7778

7879
# some special cases
7980
has_h5netcdf07, requires_h5netcdf07 = _importorskip('h5netcdf',

0 commit comments

Comments
 (0)