Skip to content

GroupBy like API for resample #1269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Feb 14, 2017 · 6 comments · Fixed by #1272
Closed

GroupBy like API for resample #1269

shoyer opened this issue Feb 14, 2017 · 6 comments · Fixed by #1272

Comments

@shoyer
Copy link
Member

shoyer commented Feb 14, 2017

Since we wrote resample in xarray, pandas updated resample to have a groupyby-like API (e.g., df.resample('24H').mean() vs. the old df.resample('24H') that uses the mean by default).

It would be nice to redo the xarray resample API to match, e.g., ds.resample(time='24H').mean() vs ds.resample('time', '24H'). This would solve a few use cases, including grouped-resample arithmetic, iterating over groups and (mostly) take care of the need for pd.TimeGrouper support (#364). If we use **kwargs for matching dimension names, this could be done with a minimally painful deprecation cycle.

@darothen
Copy link

Let me dig into this a bit right now. My analysis project for this afternoon was already going to require digging into pandas' resampling in more depth anyways.

@darothen
Copy link

Assuming we want to stick with pd.TimeGrouper under the hood, the only sticking point I've come across so far is how to have the resulting Data{Array,set}GroupBy object "remember" the resampling dimension, e.g. if you have multi-dimensional data and want to compute time means you have to call

ds.resample(time='24H').mean('time')

or else mean will operate across all dimensions. Any thoughts, @shoyer?

@max-sixty
Copy link
Collaborator

Would be great to test for these sorts of issues if we redo this: #1269

@max-sixty
Copy link
Collaborator

the only sticking point I've come across so far is how to have the resulting Data{Array,set}GroupBy object "remember" the resampling dimension

I think an interface like ds.resample(time='24H').mean() would be much better. We could do that with a wrapper of pd.TimeGrouper that also had a dim field. Or inheritance 😨

@darothen
Copy link

@MaximilianR Oh, the interface is easy enough to do, even maintaining backwards-compatibility (already have that working). I was considering going the route done with GroupBy and the classes that compose it, like DatasetGroupBy... basically, we just record the wanted resampling dimension and inject the grouping/resampling operations we want. Also adds the ability to specialize methods like .first() and .last(), which is done under the current implementation.

But.... if there's a simpler way, that might be preferable!

@shoyer
Copy link
Member Author

shoyer commented Feb 15, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants