Skip to content

Commit e02b1c3

Browse files
dcherianandersy005pre-commit-ci[bot]Illviljanmathause
committed
Enable flox in GroupBy and resample (#5734)
Closes #5734 Closes #4473 Closes #4498 Closes #659 Closes #2237 xref pangeo-data/pangeo#271 Co-authored-by: Anderson Banihirwe <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Illviljan <[email protected]> Co-authored-by: Mathias Hauser <[email protected]> Co-authored-by: Stephan Hoyer <[email protected]>
1 parent 9a62c2a commit e02b1c3

19 files changed

+1185
-355
lines changed

asv_bench/asv.conf.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@
6767
"bottleneck": [""],
6868
"dask": [""],
6969
"distributed": [""],
70+
"flox": [""],
71+
"numpy_groupies": [""],
7072
"sparse": [""]
7173
},
7274

asv_bench/benchmarks/groupby.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ def setup(self, *args, **kwargs):
1313
{
1414
"a": xr.DataArray(np.r_[np.repeat(1, self.n), np.repeat(2, self.n)]),
1515
"b": xr.DataArray(np.arange(2 * self.n)),
16+
"c": xr.DataArray(np.arange(2 * self.n)),
1617
}
1718
)
1819
self.ds2d = self.ds1d.expand_dims(z=10)
@@ -50,10 +51,11 @@ class GroupByDask(GroupBy):
5051
def setup(self, *args, **kwargs):
5152
requires_dask()
5253
super().setup(**kwargs)
53-
self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2)).chunk({"dim_0": 50})
54-
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2)).chunk(
55-
{"dim_0": 50, "z": 5}
56-
)
54+
55+
self.ds1d = self.ds1d.sel(dim_0=slice(None, None, 2))
56+
self.ds1d["c"] = self.ds1d["c"].chunk({"dim_0": 50})
57+
self.ds2d = self.ds2d.sel(dim_0=slice(None, None, 2))
58+
self.ds2d["c"] = self.ds2d["c"].chunk({"dim_0": 50, "z": 5})
5759
self.ds1d_mean = self.ds1d.groupby("b").mean()
5860
self.ds2d_mean = self.ds2d.groupby("b").mean()
5961

ci/install-upstream-wheels.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ conda uninstall -y --force \
1515
pint \
1616
bottleneck \
1717
sparse \
18+
flox \
1819
h5netcdf \
1920
xarray
2021
# to limit the runtime of Upstream CI
@@ -47,4 +48,5 @@ python -m pip install \
4748
git+https://github.com/pydata/sparse \
4849
git+https://github.com/intake/filesystem_spec \
4950
git+https://github.com/SciTools/nc-time-axis \
51+
git+https://github.com/dcherian/flox \
5052
git+https://github.com/h5netcdf/h5netcdf

ci/requirements/all-but-dask.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ dependencies:
1313
- cfgrib
1414
- cftime
1515
- coveralls
16+
- flox
1617
- h5netcdf
1718
- h5py
1819
- hdf5

ci/requirements/environment-windows.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ dependencies:
1010
- cftime
1111
- dask-core
1212
- distributed
13+
- flox
1314
- fsspec!=2021.7.0
1415
- h5netcdf
1516
- h5py

ci/requirements/environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ dependencies:
1212
- cftime
1313
- dask-core
1414
- distributed
15+
- flox
1516
- fsspec!=2021.7.0
1617
- h5netcdf
1718
- h5py

ci/requirements/min-all-deps.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- coveralls
1818
- dask-core=2021.04
1919
- distributed=2021.04
20+
- flox=0.5
2021
- h5netcdf=0.11
2122
- h5py=3.1
2223
# hdf5 1.12 conflicts with h5py=3.1

doc/whats-new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,11 @@ Performance
141141
- GroupBy binary operations are now vectorized.
142142
Previously this involved looping over all groups. (:issue:`5804`,:pull:`6160`)
143143
By `Deepak Cherian <https://github.com/dcherian>`_.
144+
- Substantially improved GroupBy operations using `flox <https://flox.readthedocs.io/en/latest/>`_.
145+
This is auto-enabled when ``flox`` is installed. Use ``xr.set_options(use_flox=False)`` to use
146+
the old algorithm. (:issue:`4473`, :issue:`4498`, :issue:`659`, :issue:`2237`, :pull:`271`).
147+
By `Deepak Cherian <https://github.com/dcherian>`_,`Anderson Banihirwe <https://github.com/andersy005>`_,
148+
`Jimmy Westling <https://github.com/illviljan>`_.
144149

145150
Internal Changes
146151
~~~~~~~~~~~~~~~~

setup.cfg

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ accel =
9898
scipy
9999
bottleneck
100100
numbagg
101+
flox
101102

102103
parallel =
103104
dask[complete]

0 commit comments

Comments
 (0)