You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import s3fs
import xarray as xr
s3 = s3fs.S3FileSystem(anon=True)
s3path = 's3://wrf-se-ak-ar5/gfdl/hist/daily/1980/WRFDS_1980-01-0[12].nc'
remote_files = s3.glob(s3path)
fileset = [s3.open(file) for file in remote_files]
ds = xr.open_mfdataset(fileset, concat_dim='Time', decode_cf=False)
ds
Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the source attribute is available only when the fileset consists of strings or Paths.
Describe the solution you'd like
I would suggest to return to the original suggestion in #2550 - pass filename_or_object as an argument to preprocess function, but with necessary inspection. Here is my attempt (code in open_mfdataset):
open_kwargs = dict(
engine=engine, chunks=chunks or {}, lock=lock, autoclose=autoclose, **kwargs
)
if preprocess is not None:
# Get number of free arguments
from inspect import signature
parms = signature(preprocess).parameters
num_preprocess_args = len([p for p in parms.values() if p.default == p.empty])
if num_preprocess_args not in (1, 2):
raise ValueError('preprocess accepts only 1 or 2 arguments')
if parallel:
import dask
# wrap the open_dataset, getattr, and preprocess with delayed
open_ = dask.delayed(open_dataset)
getattr_ = dask.delayed(getattr)
if preprocess is not None:
preprocess = dask.delayed(preprocess)
else:
open_ = open_dataset
getattr_ = getattr
datasets = [open_(p, **open_kwargs) for p in paths]
file_objs = [getattr_(ds, "_file_obj") for ds in datasets]
if preprocess is not None:
if num_preprocess_args == 1:
datasets = [preprocess(ds) for ds in datasets]
else:
datasets = [preprocess(ds, p) for (ds, p) in zip(datasets, paths)]
Describe alternatives you've considered
The simple solution would be to make xarray s3fs aware. IMHO this is not particularly elegant. Either a check for an attribute, or an import within a try/except block would be needed.
The text was updated successfully, but these errors were encountered:
It is easy to parse the above fileset representation, but there is no guarantee that some other external file representation will be amenable to parsing.
If the fix is only for s3fs, getting path attribute is more elegant, however this would require xarray to be aware of the module.
Is your feature request related to a problem? Please describe.
I am retrieving files from AWS: https://registry.opendata.aws/wrf-se-alaska-snap/. An example:
Data files for 1980 are missing time coordinate, so the above code fails. The time could be obtained by parsing file name, however in the current implementation the source attribute is available only when the fileset consists of strings or Paths.
Describe the solution you'd like
I would suggest to return to the original suggestion in #2550 - pass filename_or_object as an argument to preprocess function, but with necessary inspection. Here is my attempt (code in open_mfdataset):
With this, I can define function fix as follows:
This is backward compatible, preprocess can accept any number of arguments:
Describe alternatives you've considered
The simple solution would be to make xarray s3fs aware. IMHO this is not particularly elegant. Either a check for an attribute, or an import within a try/except block would be needed.
The text was updated successfully, but these errors were encountered: