Skip to content

Change grid level #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benbovy opened this issue Nov 10, 2023 · 16 comments
Open

Change grid level #18

benbovy opened this issue Nov 10, 2023 · 16 comments

Comments

@benbovy
Copy link
Member

benbovy commented Nov 10, 2023

It would be useful to have utility methods to change the grid resolution, very much like this and this.

Proposed behavior

This would be pretty similar to Xarray .reindex(), but here DGGS-aware.

When downgrading the resolution, only the cell coordinate would change with child cell ids replaced by their parent cell id at the given resolution. The resulting coordinate has the same size but may have duplicate labels. Users could then perform aggregation with the method of their choice just by using Xarray's .groupby().

When upgrading the resolution, the new cell coordinate has new labels (child cell ids) and the cell dimension may have an increased size, in which case the values of the data variables must be repeated according to the new cell ids along the cell dimension.

These .change_resolution() utility functions might be actually just what we need in order to align, merge or do other operations with multiple Datasets / DataArrays on the same DGGS but at different resolutions. Those are pretty simple and composable functions.

For simplicity, there would be no regridding or resampling involved here. There are two caveats, though:

  • extenstive vs. intensive quantities: The behavior detailed above is correct for intensive quantities (i.e., independent of the cell area) but not for extensive quantities. For the latter, one generic solution could be to optionally output a "weights" coordinate (same dimension than cells) computed from the cell areas and by counting duplicate cell ids. This weights coordinate could then be used to update the values of certain data variables (simple arithmetic) after upgrading the resolution. Unfortunately, in the case of resolution downgrading weighted groupby is not yet supported in Xarray: compose weighted with groupby, coarsen, resample, rolling etc. pydata/xarray#3937.
  • In some grid systems (like H3), the boundaries of the cells do not match exactly across different resolutions. We might need more advanced regridding in this case, although the solutions above may already provide good enough, first-order approximation.
@VeckoTheGecko
Copy link

VeckoTheGecko commented May 14, 2025

Just had an in depth chat with @surgura about this. These are our thoughts:

When downgrading the resolution, only the cell coordinate would change with child cell ids replaced by their parent cell id at the given resolution. The resulting coordinate has the same size but may have duplicate labels. Users could then perform aggregation with the method of their choice just by using Xarray's .groupby().

  • This doesn't really make sense for us.
    • Why would users want a dataset with just the relabeling instead of outright doing the groupby and applying an aggregator function? If users want to know parent IDs for cells they can use a different method
    • For all(?) usecases this would mean users have to do .ddgs.change_resolution(level=2).groupby("cell_ids").mean() if they're downscaling or .ddgs.change_resolution(level=6) if they're upscaling. The asymetry in the calls between down and upscaling seems unfriendly for the user

Our proposal: ds.dggs.upscale/downscale/rescale

  • upscale(level: int) -> xr.DataArray | xr.Dataset
    • does the upscaling to the level of interest, duplicating data from the parent to the child cells, returns dataset with level level
    • Errors out if level < grid.level
    • just returns if level == grid.level
  • downscale(level: int, agg: npfunc = np.mean) -> xr.DataArray | xr.Dataset
    • does the downscaling to the level of interest, and does a (user provided, but defaulting to mean) non-weighted aggregation of the child to the parent cells, returns dataset with level level
    • Errors if level > grid.level
    • just returns if level == grid.level
  • rescale(level: int, downscale_agg: npfunc | None = None) -> xr.DataArray | xr.Dataset
    • Calls upscale or downscale depending on level < grid.level or level > grid.level. Defaults aggregator to that of downscale, but is customisable. returns dataset with level level
  • -> This API seems clearer, and the user gets the output dataset in the format that they expect. Personally I really like the upscale/downscale/rescale naming since that seems quite clear to me (not sure if there is convention).

Anticipating questions:

  • Why upscale/downscale and not just rescale
    • Having upscale and downscale also as part of the public API is nice sugar for users so that their code is more readable (after all, these are very different operations and allowing users to explicitly choose one execution path is nice)
  • What about non-HealPix grids? (or grids that aren't strictly hierarchical)
    • idk, NotImplementedError for now. I think this functionality and API for now is super powerful to users and also quite easy to implement.
  • Why level?

Can I get cracking on this feature? Thoughts?

@tinaok
Copy link
Contributor

tinaok commented May 14, 2025

@keewis @benbovy

@benbovy
Copy link
Member Author

benbovy commented May 14, 2025

Can I get cracking on this feature? Thoughts?

Yes please go ahead!

I like your proposal very much.

@keewis
Copy link
Collaborator

keewis commented May 14, 2025

I think there's two distinct (but interdependent) operations that both would make sense:

  1. change the level of the cell ids (in line with hierarchical operations on cell ids: parents #62)
  2. modify the data to change dimensions

(to be clear, I think 1 depends on 2, so I'd start with that)

The exact API will probably need some refinement (not everything that xarray provides has a numpy equivalent), but otherwise this looks fine. cdshealpix (rust) recently gained methods for hierarchical operations, and there's keewis/healpix-geo#24 which added low-level functions for 1 (I just didn't have time to make progress there recently)

@strobpr
Copy link

strobpr commented May 15, 2025

Apologies to intervene here very briefly, but I see the terminology that is being used above quite critical:

'upscaling' and 'downscaling' are terms used in opposite way by different geoscience user communities. A much clearer choice would be 'coarsening' and 'refining'.

'resolution' is simply the wrong term to use when it comes to 'grid spacing'. The resolution of gridded data is to same extent related (and limited) by the grid spacing, but the grid spacing is no useful indication for resolution (and there is no such thing as a 'grid resolution'). E.g. if you interpolate data into a finer grid (smaller spacing) you do NOT change the resolution!!! I see an increasing misuse of these terms which undermines basic data understanding. A suitable neutral term would be re-gridding.

Please consider that words matter and help (or hamper) to understand concepts correctly. If possible stick as close as possible to DGGS terminology as proposed in the standard. If unsure please ask around before jumping on some jargon which might lead to further confusion and misunderstanding in our science.

Thanks!

@d70-t
Copy link

d70-t commented May 15, 2025

I think this would be a good feature. However I also agree that naming matters, and up/down isn't obviously defined.
I'm also wondering if we might want to support weighting? There are many cases where just some of the to-be aggregated cells contain data, and especially when doing multiple levels of aggregation across partially covered cells, un-weighted aggregation might return false results.

@benbovy
Copy link
Member Author

benbovy commented May 15, 2025

I also like coarsen and refine as method names. I'm wondering if we really need a rescale method?

(note about resolution: this issue is quite old and since then we changed it to level).

@keewis by API refinement are you thinking about ds.dggs.coarsen(...).mean() vs. ds.dggs.coarsen(..., agg=np.mean)? What are the pros and cons of these options here, apart that the first one is closer to Xarray's groupby, coarsen, rolling, etc.?

Agreed for supporting weights, this will also be useful if we eventually support DGGSs where the cell structures are not congruent across refinement levels (e.g., ISEA3H). Probably we'd need it too for refine, along with a pluggable partition function?

@tinaok
Copy link
Contributor

tinaok commented May 16, 2025

cc @allixender, If you have comment from OGC DGGS SWG point of view for the naming issue.

@VeckoTheGecko
Copy link

I also like coarsen and refine as method names. I'm wondering if we really need a rescale method?

Agreed. Rescale is not particularly needed. I'll look to update my PR draft when I have more time (next weekend).

@benbovy
Copy link
Member Author

benbovy commented May 16, 2025

What about a rescale or regrid method to remap data on a given absolute level value, while coarsen and refine both accept relative level values?

@VeckoTheGecko
Copy link

VeckoTheGecko commented May 16, 2025

My proposal was only talking about absolute, but if we want to also support relative I think the following API would be nice (instead of a new method)

def coarsen(*, by:int , level: int):
    """
    by: relative
    level: absolute
    """

@benbovy
Copy link
Member Author

benbovy commented May 16, 2025

nit: to_level seems even more clearer than level.

@keewis
Copy link
Collaborator

keewis commented May 16, 2025

what if we used to and by? Both are given in "level space" so I don't think people would be confused, and most importantly they read as "coarsen by {value}" and "coarsen to {value}", which feels pretty natural?

@strobpr
Copy link

strobpr commented May 16, 2025

'coarsen to' of course implies that you know which level you're at otherwise it might be void.

BTW 'refinement level' is the correct term for a specific grid instance in a DGGS from the SWG perspective. A possible issue with shortening it to 'level' is that there are many 'levels' in EO terminology. E.g. changing the 'refinement level' is different from changing the 'processing level'.

I believe I remember a discussion on 'resolution' somewhere on Github, but the issue tends to be forgotten and the term is hard to kill now that's so widely (and wrongly) used.

Regarding changing the gridding of data (aka 're-sampling'), that is always critical and requires certain conditions to be done safely. E.g. is coarsening only allowed if the coarser zone is populated in a representative way by finer samples (ideally it is continuously sampled). If some samples are missing that is okay as long as the remainder is not biased (no longer representative). Refining is even more complex, leads to far here...

Generally, please keep in mind that in the near future we will have to consider uncertainty in all operations we do with values (in fact 'data' consists always of 'value' and 'uncertainty', although the latter is often neglected). The effect of re-sampling on uncertainty is actually more important than the one on the values, and even if the values change only slightly, the uncertainty always increases which each re-sampling. That's why it should be limited to a minimum and done only with utmost care.

@allixender
Copy link
Contributor

cc @allixender, If you have comment from OGC DGGS SWG point of view for the naming issue.

@strobpr is also an trusty advisor/contributor/member in the OGC DGGS SWG, his comments on level , refinement ratio / refinement level and resolution are on point.

The wording of level for implies a better logic, because really we don't change the original data, we are also not regridding in the traditional/conservative sense, we are mostly resampling and stay within the same DGGS grid logic. Thus, a user is in data fidelity sense only moving/zooming up or down. My 2 cents.

@keewis
Copy link
Collaborator

keewis commented May 16, 2025

'coarsen to' of course implies that you know which level you're at otherwise it might be void.

xdggs has been built on the assumption that the level is known, so a lot of other things will fail if we don't know the level (and can't infer it from the cell ids)

I believe I remember a discussion on 'resolution' somewhere on Github

indeed, the discussion was mostly in #62, but also in #64 and #65. The conclusion was that in the context of a grid (and xdggs, at least, only deals with grids) the name level is unambiguous.

@benbovy benbovy changed the title Change grid resolution Change grid level May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants