Skip to content

Interpolation - Support extrapolation method "clip" #3962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Illviljan opened this issue Apr 10, 2020 · 4 comments
Open

Interpolation - Support extrapolation method "clip" #3962

Illviljan opened this issue Apr 10, 2020 · 4 comments

Comments

@Illviljan
Copy link
Contributor

Hello,

I would like an option in da.interp() that instead of returning NaNs during extrapolation returns the data corresponding to the end of the breakpoint data set range.

One way to do this is to limit the new coordinates to the array coordinates minimum and maximum value, I did a simple example with this solution down below.
I think this is a rather safe way as we are just modifying the inputs to all the various interpolation classes that xarray is using at the moment.
But it does look a little weird when printing the extrapolated value, the coordinates shows the limited value instead of the requested coordinates. Maybe this can be handled elegantly somewhere in the source code?

MATLAB uses this quite frequently in their interpolation functions:

MCVE Code Sample

import numpy as np
import xarray as xr


def interp(da, coords, extrapolation='clip'):
    """
    Linear interpolation that clips the inputs to the coords min and max value.

    Parameters
    ----------
    da : DataArray
        DataArray to interpolate.
    coords : dict
        Coordinates for the interpolated value.
    """
    if extrapolation == 'clip':
        for k, v in da.coords.items():
            coords[k] = np.maximum(coords[k], np.min(v.values))
            coords[k] = np.minimum(coords[k], np.max(v.values))

    return da.interp(coords)


# Create coordinates:
x = np.linspace(1000, 6000, 4)
y = np.linspace(100, 1200, 3)

# Create data:
X = np.meshgrid(*[x, y], indexing='ij')
data = X[0] * X[1]

# Create DataArray:
da = xr.DataArray(data=data, coords=[('x', x), ('y', y)], name='data')

# Attempt to extrapolate:
datai = interp(da, {'x': 7000, 'y': 375})

Expected Output

print(datai)
<xarray.DataArray 'data' ()>
array(2250000.)
Coordinates:
    x        float64 6e+03
    y        float64 375.0

Versions

Output of `xr.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.7.7 (default, Mar 23 2020, 23:19:08) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.15.0
pandas: 1.0.3
numpy: 1.18.1
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.13.0
distributed: 2.13.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.1.3.post20200330
pip: 20.0.2
conda: 4.8.3
pytest: 5.4.1
IPython: 7.13.0
sphinx: 2.4.4

@dcherian
Copy link
Contributor

Scipy will do this for you if you ask it to:

da.interp(x=7000, y=375, method="linear", kwargs=dict(fill_value=None))

See #3956 .

@Illviljan
Copy link
Contributor Author

Illviljan commented Apr 10, 2020

Nice find! The documentation doesn't explain that at all currently.

But that solution doesn't work for 1d DataArrays. You have to use this kwargs instead:

kwargs = dict(fill_value=(np.min(da.data), np.max(da.data)))

So it still isn't as convenient as I would like it to be. The wrapper function can now be simplified to this however:

import numpy as np
import xarray as xr


def interp(da, coords, extrapolation='clip'):
    """
    Linear interpolation that clips the inputs to the coords min and max value.

    Parameters
    ----------
    da : DataArray
        DataArray to interpolate.
    coords : dict
        Coordinates for the interpolated value.
    """
    if extrapolation == 'clip':
        if len(da.coords) > 1:
            kwargs = dict(fill_value=None)
        else:
            kwargs = dict(fill_value=(np.min(da.data), np.max(da.data)))
    return da.interp(coords, kwargs=kwargs)


# Create coordinates:
x = np.linspace(1000, 6000, 4)
y = np.linspace(100, 1200, 3)
z = np.linspace(1, 2, 2)

# Create 1D DataArray:
da1 = xr.DataArray(data=2*x, coords=[('x', x)])

# Create 2D DataArray:
X = np.meshgrid(*[x, y], indexing='ij')
data = X[0] * X[1]
da2 = xr.DataArray(data=data, coords=[('x', x), ('y', y)])

# Create 3D DataArray:
X = np.meshgrid(*[x, y, z], indexing='ij')
data = X[0] * X[1] * X[2]
da3 = xr.DataArray(data=data, coords=[('x', x), ('y', y), ('z', z)])

# Attempt to extrapolate:
print(interp(da1, {'x': 7000}))
print(interp(da2, {'x': 7000, 'y': 375}))
print(interp(da3, {'x': 7000, 'y': 375, 'z': 1}))

@dcherian
Copy link
Contributor

As noted in #3956 fill_value="extrapolate" is what we need for 1D.

xarray could use the same interpolation function for both 1d and nd.

Not sure if there's a big performance gain. @jhamman implemented these interpolators originally AFAIK. Perhaps he remembers why we use both.

@Illviljan Illviljan mentioned this issue Jan 1, 2021
4 tasks
@stale
Copy link

stale bot commented May 2, 2022

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants