-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Resample / upsample behavior diverges from pandas #1631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The key difference appears to be:
I think this is a bug in pandas, since the behavior is inconsistent with other resample methods like
More generally: (This does suggest that xarray could use a direct Another example:
It is useful that pandas's upsampling is only repeating values within the previously valid range. Otherwise it is likely to interpolate over true data gaps. As another use-case: suppose we have a temperature dataset with 3 hourly measurements, and we want to upsample it to 1 hour resolution. Occasionally, measurements are missing for day(s) at a time, which we mark with missing values (suppose the server running the model ran out of disk space). It is useful to be able to resample to a higher resolution without entirely unrealistic interpolation over data gaps. |
Thanks @shoyer. I always appreciated this feature in Pandas so I'm bummed to see it may not have been intentional. I need a xarray interpolate method that fills NaNs so I'll give that a go. I suspect it will be a widely used feature for dealing with missing data. |
Let's see where the pandas discussion ends up. If xarray had a method for interpolating to fill missing values, achieving your desired result would be as a simple as chaining another interpolate call, e.g., |
Thanks for documenting this @jhamman. I think all the logic is in |
Thanks for posting this @jhamman. It's really helping me understand what is going on with my data when I use xarray. My understanding of Pandas is that it should not by default be interpolating - however I am downsampling and this is stated for upsampling (in Python for Data Analysis). |
I've found a few issues where xarray's new resample / upsample functionality is diverging from Pandas. I think they are mostly surrounding how NaNs are treated. Thoughts from @shoyer, @darothen and others.
Gist with all the juicy details: https://gist.github.com/jhamman/354f0e5ff32a39550ffd25800e7214fc#file-xarray_resample-ipynb
xref: #1608, #1272
The text was updated successfully, but these errors were encountered: