Skip to content

Can't re-save netCDF after opening it and modifying it? #2029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fasiha opened this issue Mar 30, 2018 · 4 comments
Closed

Can't re-save netCDF after opening it and modifying it? #2029

fasiha opened this issue Mar 30, 2018 · 4 comments

Comments

@fasiha
Copy link

fasiha commented Mar 30, 2018

Code Sample, copy-pastable

import xarray as xr
import numpy as np
import pandas as pd

filename = 'foo.nc'

print('creating fresh file')
temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset(
    {
        'temperature': (['x', 'y', 'time'], temp),
        'precipitation': (['x', 'y', 'time'], precip)
    },
    coords={
        'lon': (['x', 'y'], lon),
        'lat': (['x', 'y'], lat),
        'time': pd.date_range('2014-09-06', periods=3),
        'reference_time': pd.Timestamp('2014-09-05')
    })
ds.to_netcdf(filename)

del ds

ds = xr.open_dataset(filename, autoclose=True)
print('opened file')
print(ds['temperature'])
ds['temperature'][0, 0, 0] += 1000

ds.to_netcdf(filename) ### Crashes

# import os
# ds.to_netcdf(filename + '2')
# os.rename(filename + '2', filename)

Problem description

<snipped very long stacktrace, can produce if desired>
OSError: [Errno -51] NetCDF: Unknown file format: b'/path/to/foo.nc'

I encountered this problem when opening a netCDF file, modifying it, and trying to save it back. This is with netCDF4==1.3.1 and scipy==1.0.1. (Potentially related: #2019?)

Expected Output

If instead of to_netcdf overwriting the just-opened file, I write to a new file and then os.rename (see the three commented lines above) the new file to the original location, all is well. ncdump reports that my change took.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.3
conda: None
pytest: None
IPython: 6.2.1
sphinx: None

@shoyer
Copy link
Member

shoyer commented Mar 30, 2018

The problem is that xarray is that when you open up a dataset, xarray does lazy loading of the data from the source file. This lazy loading breaks when you override the source file. As a user, the work around is to always load files entirely from disk, e.g., by calling .load(), or to not attempt to override existing files.

I'm not quite sure how we should improve this, but this does certainly come up with some frequency, especially for new users. A friendlier warning/error would be nice, but I'm not sure how to detect this behavior in general (this information is not currently very accessible).

We could potentially always write to temporary files in to_netcdf() and then rename in a final step after writing the data. As a bonus, this results in atomic writes on most platforms.

@fasiha
Copy link
Author

fasiha commented Mar 31, 2018

Thanks @shoyer! Question: I'd wondered if maybe I open_dataset a NetCDF file and then modified one of the data_vars, maybe xarray would update the file without me doing anything, but that wasn't the case, correct? So only the loading of data is lazy, data modified is modified only in memory? Thanks for the explanation!

@fasiha fasiha closed this as completed Mar 31, 2018
@shoyer
Copy link
Member

shoyer commented Mar 31, 2018

That's right, xarray will never modify a file on disk unless you use to_netcdf().

@DancingQuanta
Copy link

I assumed that using a with statement I have loaded the data into memory and closing the file.

with xr.open_dataset(path) as ds:
    data = ds

However this does not load into memory entirely and so lazy loading is still in effect.
So using .load does what I wanted

with xr.open_dataset(path) as ds:
    data = ds.load()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants