Skip to content

Commit df614b9

Browse files
mathausedcherian
andauthored
Feature/weighted (#2922)
* weighted for DataArray * remove some commented code * pep8 and faulty import tests * add weighted sum, replace 0s in sum_of_wgt * weighted: overhaul tests * weighted: pep8 * weighted: pep8 lines * weighted update docs * weighted: fix typo * weighted: pep8 * undo changes to avoid merge conflict * add weighted to dataarray again * remove super * overhaul core/weighted.py * add DatasetWeighted class * _maybe_get_all_dims return sorted tuple * work on: test_weighted * black and flake8 * Apply suggestions from code review (docs) * restructure interim * restructure classes * update weighted.py * black * use map; add keep_attrs * implement expected_weighted; update tests * add whats new * undo changes to whats-new * F811: noqa where? * api.rst * add to computation * small updates * add example to gallery * typo * another typo * correct docstring in core/common.py * typos * adjust review * clean tests * add test nonequal coords * comment on use of dot * fix erroneous merge * update tests * move example to notebook * move whats-new entry to 15.1 * some doc updates * dot to own function * simplify some tests * Doc updates * very minor changes. * fix & add references * doc: return 0/NaN on 0 weights * Update xarray/core/common.py Co-authored-by: dcherian <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>
1 parent 65a5bff commit df614b9

10 files changed

+922
-1
lines changed

doc/api.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,7 @@ Computation
165165
Dataset.groupby_bins
166166
Dataset.rolling
167167
Dataset.rolling_exp
168+
Dataset.weighted
168169
Dataset.coarsen
169170
Dataset.resample
170171
Dataset.diff
@@ -340,6 +341,7 @@ Computation
340341
DataArray.groupby_bins
341342
DataArray.rolling
342343
DataArray.rolling_exp
344+
DataArray.weighted
343345
DataArray.coarsen
344346
DataArray.dt
345347
DataArray.resample
@@ -577,6 +579,22 @@ Rolling objects
577579
core.rolling.DatasetRolling.reduce
578580
core.rolling_exp.RollingExp
579581

582+
Weighted objects
583+
================
584+
585+
.. autosummary::
586+
:toctree: generated/
587+
588+
core.weighted.DataArrayWeighted
589+
core.weighted.DataArrayWeighted.mean
590+
core.weighted.DataArrayWeighted.sum
591+
core.weighted.DataArrayWeighted.sum_of_weights
592+
core.weighted.DatasetWeighted
593+
core.weighted.DatasetWeighted.mean
594+
core.weighted.DatasetWeighted.sum
595+
core.weighted.DatasetWeighted.sum_of_weights
596+
597+
580598
Coarsen objects
581599
===============
582600

doc/computation.rst

Lines changed: 85 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: xarray
2+
13
.. _comput:
24

35
###########
@@ -241,12 +243,94 @@ You can also use ``construct`` to compute a weighted rolling sum:
241243
To avoid this, use ``skipna=False`` as the above example.
242244

243245

246+
.. _comput.weighted:
247+
248+
Weighted array reductions
249+
=========================
250+
251+
:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
252+
and :py:meth:`Dataset.weighted` array reduction methods. They currently
253+
support weighted ``sum`` and weighted ``mean``.
254+
255+
.. ipython:: python
256+
257+
coords = dict(month=('month', [1, 2, 3]))
258+
259+
prec = xr.DataArray([1.1, 1.0, 0.9], dims=('month', ), coords=coords)
260+
weights = xr.DataArray([31, 28, 31], dims=('month', ), coords=coords)
261+
262+
Create a weighted object:
263+
264+
.. ipython:: python
265+
266+
weighted_prec = prec.weighted(weights)
267+
weighted_prec
268+
269+
Calculate the weighted sum:
270+
271+
.. ipython:: python
272+
273+
weighted_prec.sum()
274+
275+
Calculate the weighted mean:
276+
277+
.. ipython:: python
278+
279+
weighted_prec.mean(dim="month")
280+
281+
The weighted sum corresponds to:
282+
283+
.. ipython:: python
284+
285+
weighted_sum = (prec * weights).sum()
286+
weighted_sum
287+
288+
and the weighted mean to:
289+
290+
.. ipython:: python
291+
292+
weighted_mean = weighted_sum / weights.sum()
293+
weighted_mean
294+
295+
However, the functions also take missing values in the data into account:
296+
297+
.. ipython:: python
298+
299+
data = xr.DataArray([np.NaN, 2, 4])
300+
weights = xr.DataArray([8, 1, 1])
301+
302+
data.weighted(weights).mean()
303+
304+
Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
305+
in 0.6.
306+
307+
308+
If the weights add up to to 0, ``sum`` returns 0:
309+
310+
.. ipython:: python
311+
312+
data = xr.DataArray([1.0, 1.0])
313+
weights = xr.DataArray([-1.0, 1.0])
314+
315+
data.weighted(weights).sum()
316+
317+
and ``mean`` returns ``NaN``:
318+
319+
.. ipython:: python
320+
321+
data.weighted(weights).mean()
322+
323+
324+
.. note::
325+
``weights`` must be a :py:class:`DataArray` and cannot contain missing values.
326+
Missing values can be replaced manually by ``weights.fillna(0)``.
327+
244328
.. _comput.coarsen:
245329

246330
Coarsen large arrays
247331
====================
248332

249-
``DataArray`` and ``Dataset`` objects include a
333+
:py:class:`DataArray` and :py:class:`Dataset` objects include a
250334
:py:meth:`~xarray.DataArray.coarsen` and :py:meth:`~xarray.Dataset.coarsen`
251335
methods. This supports the block aggregation along multiple dimensions,
252336

doc/examples.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Examples
66

77
examples/weather-data
88
examples/monthly-means
9+
examples/area_weighted_temperature
910
examples/multidimensional-coords
1011
examples/visualization_gallery
1112
examples/ROMS_ocean_model
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"toc": true
7+
},
8+
"source": [
9+
"<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
10+
"<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Compare-weighted-and-unweighted-mean-temperature\" data-toc-modified-id=\"Compare-weighted-and-unweighted-mean-temperature-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Compare weighted and unweighted mean temperature</a></span><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#Data\" data-toc-modified-id=\"Data-1.0.1\"><span class=\"toc-item-num\">1.0.1&nbsp;&nbsp;</span>Data</a></span></li><li><span><a href=\"#Creating-weights\" data-toc-modified-id=\"Creating-weights-1.0.2\"><span class=\"toc-item-num\">1.0.2&nbsp;&nbsp;</span>Creating weights</a></span></li><li><span><a href=\"#Weighted-mean\" data-toc-modified-id=\"Weighted-mean-1.0.3\"><span class=\"toc-item-num\">1.0.3&nbsp;&nbsp;</span>Weighted mean</a></span></li><li><span><a href=\"#Plot:-comparison-with-unweighted-mean\" data-toc-modified-id=\"Plot:-comparison-with-unweighted-mean-1.0.4\"><span class=\"toc-item-num\">1.0.4&nbsp;&nbsp;</span>Plot: comparison with unweighted mean</a></span></li></ul></li></ul></li></ul></div>"
11+
]
12+
},
13+
{
14+
"cell_type": "markdown",
15+
"metadata": {},
16+
"source": [
17+
"# Compare weighted and unweighted mean temperature\n",
18+
"\n",
19+
"\n",
20+
"Author: [Mathias Hauser](https://github.com/mathause/)\n",
21+
"\n",
22+
"\n",
23+
"We use the `air_temperature` example dataset to calculate the area-weighted temperature over its domain. This dataset has a regular latitude/ longitude grid, thus the gridcell area decreases towards the pole. For this grid we can use the cosine of the latitude as proxy for the grid cell area.\n"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {
30+
"ExecuteTime": {
31+
"end_time": "2020-03-17T14:43:57.222351Z",
32+
"start_time": "2020-03-17T14:43:56.147541Z"
33+
}
34+
},
35+
"outputs": [],
36+
"source": [
37+
"%matplotlib inline\n",
38+
"\n",
39+
"import cartopy.crs as ccrs\n",
40+
"import matplotlib.pyplot as plt\n",
41+
"import numpy as np\n",
42+
"\n",
43+
"import xarray as xr"
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"metadata": {},
49+
"source": [
50+
"### Data\n",
51+
"\n",
52+
"Load the data, convert to celsius, and resample to daily values"
53+
]
54+
},
55+
{
56+
"cell_type": "code",
57+
"execution_count": null,
58+
"metadata": {
59+
"ExecuteTime": {
60+
"end_time": "2020-03-17T14:43:57.831734Z",
61+
"start_time": "2020-03-17T14:43:57.651845Z"
62+
}
63+
},
64+
"outputs": [],
65+
"source": [
66+
"ds = xr.tutorial.load_dataset(\"air_temperature\")\n",
67+
"\n",
68+
"# to celsius\n",
69+
"air = ds.air - 273.15\n",
70+
"\n",
71+
"# resample from 6-hourly to daily values\n",
72+
"air = air.resample(time=\"D\").mean()\n",
73+
"\n",
74+
"air"
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"Plot the first timestep:"
82+
]
83+
},
84+
{
85+
"cell_type": "code",
86+
"execution_count": null,
87+
"metadata": {
88+
"ExecuteTime": {
89+
"end_time": "2020-03-17T14:43:59.887120Z",
90+
"start_time": "2020-03-17T14:43:59.582894Z"
91+
}
92+
},
93+
"outputs": [],
94+
"source": [
95+
"projection = ccrs.LambertConformal(central_longitude=-95, central_latitude=45)\n",
96+
"\n",
97+
"f, ax = plt.subplots(subplot_kw=dict(projection=projection))\n",
98+
"\n",
99+
"air.isel(time=0).plot(transform=ccrs.PlateCarree(), cbar_kwargs=dict(shrink=0.7))\n",
100+
"ax.coastlines()"
101+
]
102+
},
103+
{
104+
"cell_type": "markdown",
105+
"metadata": {},
106+
"source": [
107+
"### Creating weights\n",
108+
"\n",
109+
"For a for a rectangular grid the cosine of the latitude is proportional to the grid cell area."
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": null,
115+
"metadata": {
116+
"ExecuteTime": {
117+
"end_time": "2020-03-17T14:44:18.777092Z",
118+
"start_time": "2020-03-17T14:44:18.736587Z"
119+
}
120+
},
121+
"outputs": [],
122+
"source": [
123+
"weights = np.cos(np.deg2rad(air.lat))\n",
124+
"weights.name = \"weights\"\n",
125+
"weights"
126+
]
127+
},
128+
{
129+
"cell_type": "markdown",
130+
"metadata": {},
131+
"source": [
132+
"### Weighted mean"
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": null,
138+
"metadata": {
139+
"ExecuteTime": {
140+
"end_time": "2020-03-17T14:44:52.607120Z",
141+
"start_time": "2020-03-17T14:44:52.564674Z"
142+
}
143+
},
144+
"outputs": [],
145+
"source": [
146+
"air_weighted = air.weighted(weights)\n",
147+
"air_weighted"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"metadata": {
154+
"ExecuteTime": {
155+
"end_time": "2020-03-17T14:44:54.334279Z",
156+
"start_time": "2020-03-17T14:44:54.280022Z"
157+
}
158+
},
159+
"outputs": [],
160+
"source": [
161+
"weighted_mean = air_weighted.mean((\"lon\", \"lat\"))\n",
162+
"weighted_mean"
163+
]
164+
},
165+
{
166+
"cell_type": "markdown",
167+
"metadata": {},
168+
"source": [
169+
"### Plot: comparison with unweighted mean\n",
170+
"\n",
171+
"Note how the weighted mean temperature is higher than the unweighted."
172+
]
173+
},
174+
{
175+
"cell_type": "code",
176+
"execution_count": null,
177+
"metadata": {
178+
"ExecuteTime": {
179+
"end_time": "2020-03-17T14:45:08.877307Z",
180+
"start_time": "2020-03-17T14:45:08.673383Z"
181+
}
182+
},
183+
"outputs": [],
184+
"source": [
185+
"weighted_mean.plot(label=\"weighted\")\n",
186+
"air.mean((\"lon\", \"lat\")).plot(label=\"unweighted\")\n",
187+
"\n",
188+
"plt.legend()"
189+
]
190+
}
191+
],
192+
"metadata": {
193+
"kernelspec": {
194+
"display_name": "Python 3",
195+
"language": "python",
196+
"name": "python3"
197+
},
198+
"language_info": {
199+
"codemirror_mode": {
200+
"name": "ipython",
201+
"version": 3
202+
},
203+
"file_extension": ".py",
204+
"mimetype": "text/x-python",
205+
"name": "python",
206+
"nbconvert_exporter": "python",
207+
"pygments_lexer": "ipython3",
208+
"version": "3.7.6"
209+
},
210+
"toc": {
211+
"base_numbering": 1,
212+
"nav_menu": {},
213+
"number_sections": true,
214+
"sideBar": true,
215+
"skip_h1_title": false,
216+
"title_cell": "Table of Contents",
217+
"title_sidebar": "Contents",
218+
"toc_cell": true,
219+
"toc_position": {},
220+
"toc_section_display": true,
221+
"toc_window_display": true
222+
}
223+
},
224+
"nbformat": 4,
225+
"nbformat_minor": 4
226+
}

doc/whats-new.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ Breaking changes
2525
New Features
2626
~~~~~~~~~~~~
2727

28+
- Weighted array reductions are now supported via the new :py:meth:`DataArray.weighted`
29+
and :py:meth:`Dataset.weighted` methods. See :ref:`comput.weighted`. (:issue:`422`, :pull:`2922`).
30+
By `Mathias Hauser <https://github.com/mathause>`_
2831
- Added support for :py:class:`pandas.DatetimeIndex`-style rounding of
2932
``cftime.datetime`` objects directly via a :py:class:`CFTimeIndex` or via the
3033
:py:class:`~core.accessor_dt.DatetimeAccessor`.

0 commit comments

Comments
 (0)