Can't re-save netCDF after opening it and modifying it? #2029

fasiha · 2018-03-30T01:16:11Z

Code Sample, copy-pastable

import xarray as xr
import numpy as np
import pandas as pd

filename = 'foo.nc'

print('creating fresh file')
temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
ds = xr.Dataset(
    {
        'temperature': (['x', 'y', 'time'], temp),
        'precipitation': (['x', 'y', 'time'], precip)
    },
    coords={
        'lon': (['x', 'y'], lon),
        'lat': (['x', 'y'], lat),
        'time': pd.date_range('2014-09-06', periods=3),
        'reference_time': pd.Timestamp('2014-09-05')
    })
ds.to_netcdf(filename)

del ds

ds = xr.open_dataset(filename, autoclose=True)
print('opened file')
print(ds['temperature'])
ds['temperature'][0, 0, 0] += 1000

ds.to_netcdf(filename) ### Crashes

# import os
# ds.to_netcdf(filename + '2')
# os.rename(filename + '2', filename)

Problem description

<snipped very long stacktrace, can produce if desired>
OSError: [Errno -51] NetCDF: Unknown file format: b'/path/to/foo.nc'

I encountered this problem when opening a netCDF file, modifying it, and trying to save it back. This is with netCDF4==1.3.1 and scipy==1.0.1. (Potentially related: #2019?)

Expected Output

If instead of to_netcdf overwriting the just-opened file, I write to a new file and then os.rename (see the three commented lines above) the new file to the original location, all is well. ncdump reports that my change took.

Output of `xr.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.14.2
scipy: 1.0.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.3
conda: None
pytest: None
IPython: 6.2.1
sphinx: None

The text was updated successfully, but these errors were encountered:

shoyer · 2018-03-30T17:10:12Z

The problem is that xarray is that when you open up a dataset, xarray does lazy loading of the data from the source file. This lazy loading breaks when you override the source file. As a user, the work around is to always load files entirely from disk, e.g., by calling .load(), or to not attempt to override existing files.

I'm not quite sure how we should improve this, but this does certainly come up with some frequency, especially for new users. A friendlier warning/error would be nice, but I'm not sure how to detect this behavior in general (this information is not currently very accessible).

We could potentially always write to temporary files in to_netcdf() and then rename in a final step after writing the data. As a bonus, this results in atomic writes on most platforms.

fasiha · 2018-03-31T22:30:50Z

Thanks @shoyer! Question: I'd wondered if maybe I open_dataset a NetCDF file and then modified one of the data_vars, maybe xarray would update the file without me doing anything, but that wasn't the case, correct? So only the loading of data is lazy, data modified is modified only in memory? Thanks for the explanation!

shoyer · 2018-03-31T22:37:06Z

That's right, xarray will never modify a file on disk unless you use to_netcdf().

DancingQuanta · 2019-01-06T10:46:39Z

I assumed that using a with statement I have loaded the data into memory and closing the file.

with xr.open_dataset(path) as ds:
    data = ds

However this does not load into memory entirely and so lazy loading is still in effect.
So using .load does what I wanted

with xr.open_dataset(path) as ds:
    data = ds.load()

fasiha closed this as completed Mar 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't re-save netCDF after opening it and modifying it? #2029

Can't re-save netCDF after opening it and modifying it? #2029

fasiha commented Mar 30, 2018

shoyer commented Mar 30, 2018

fasiha commented Mar 31, 2018

shoyer commented Mar 31, 2018

DancingQuanta commented Jan 6, 2019

Can't re-save netCDF after opening it and modifying it? #2029

Can't re-save netCDF after opening it and modifying it? #2029

Comments

fasiha commented Mar 30, 2018

Code Sample, copy-pastable

Problem description

Expected Output

Output of xr.show_versions()

shoyer commented Mar 30, 2018

fasiha commented Mar 31, 2018

shoyer commented Mar 31, 2018

DancingQuanta commented Jan 6, 2019

Output of `xr.show_versions()`