-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Error when using .apply_ufunc with .groupby_bins #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Upon some more testing it seems the problem is related to how I pass the argument input_core_dims=[dims, dims, ['dummy']], The bins are always going to be a 1D array that I want to pass identically for each call to |
You probably need to give bins an explicit We should update |
Ok I will give it a try soon, for now I shamelessly hardcoded it and it seems to work so far. |
I added the line Here is the full tracer for this time (this was when I tried to save a netcdf). I could apply
|
I looked into this a little more. The fix is to make a copy of def _func(data, bin_data, bins):
"""Group unlabeled array 'data' according to values in 'bin_data' using
bins defined in 'bins' and sum all values"""
bins = np.array(bins)
labels = bins[1:]
da_data = xr.DataArray(data, name='data')
da_bin_data = xr.DataArray(bin_data, name='bin_data')
binned = da_data.groupby_bins(da_bin_data, bins, labels=labels,
include_lowest=True).sum()
return binned The problem is that broadcasting (inside |
Thank you very much! I will give that a try in the next days. |
Closing since upstream issues have been closed. |
I am trying to create a function that applies a .groupby_bins operation over specified dimensions of a xarray dataset. E.g. I want to be able to sum temperture, salinity and other values grouped by oceanic oxygen concentrations.
I want to be able to be flexible over which dimensions I apply the groupby_bins operation. For instance, I would like to apply it in every depth colum (resulting in an array of (x,y,time) but also over all spatial dimensions, resulting in a timeseries.
I currently run into a strange error when I try the following.
Code Sample, a copy-pastable example if possible
I am showing the problem here on a sythetic example, since my current working dataset is quite big.
The problem is the exact same.
Now I define some bins and apply the private
_func
on the DataArray. This works as expected. Note that the array just contains ones, hence we see 3000 in the first bin.This would e.g. be an operation on a single time step. But when I now try to apply the function over the full array (core dimensions are set to all available dimensions).. I am getting a very strange error
This error only gets triggered upon computation.
Problem description
I am not sure If this is a bug or a user error on my side. I am still trying to get used to
.apply_ufunc
.If anybody has an idea for a workaround I would greatly appreciate it.
I am not sure if the rewrapping in xr.DataArrays in
_func
is actually necessary. I tried to find an equivalent functions that operates directly on dask.arrays but was not successful.Output of
xr.show_versions()
xarray: 0.10.0rc1-9-gdbf7b01
pandas: 0.21.0
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.0
matplotlib: 2.1.0
cartopy: 0.15.1
seaborn: 0.8.1
setuptools: 38.2.3
pip: 9.0.1
conda: None
pytest: 3.3.1
IPython: 6.2.1
sphinx: 1.6.5
The text was updated successfully, but these errors were encountered: