Description
After doing
# !wget https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/dev/xarray-issue/source.nc
ds = xr.open_dataset('source.nc')
ds = ds.rename({'number': 'n'})
ds = ds.set_coords('valid_time')
print(ds)
ds.to_netcdf('out.nc')
ds = xr.open_dataset('out.nc')
print(ds)
'valid_time'
is not a coordinate in the netcdf file coordinate is turned into a variable.
<xarray.Dataset>
Dimensions: (heightAboveGround: 1, latitude: 121, longitude: 240, n: 2, step: 3, time: 1)
Coordinates:
* n (n) int64 0 1
* time (time) datetime64[ns] 2020-01-02
* step (step) timedelta64[ns] 1 days 2 days 3 days
* heightAboveGround (heightAboveGround) int64 2
* latitude (latitude) float64 90.0 88.5 87.0 ... -87.0 -88.5 -90.0
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
valid_time (time, step) datetime64[ns] ...
Data variables:
t2m (n, time, step, heightAboveGround, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-04-15T12:08:59 GRIB to CDM+CF via cfgrib-0....
<xarray.Dataset>
Dimensions: (heightAboveGround: 1, latitude: 121, longitude: 240, n: 2, step: 3, time: 1)
Coordinates:
* n (n) int64 0 1
* time (time) datetime64[ns] 2020-01-02
* step (step) timedelta64[ns] 1 days 2 days 3 days
* heightAboveGround (heightAboveGround) int64 2
* latitude (latitude) float64 90.0 88.5 87.0 ... -87.0 -88.5 -90.0
* longitude (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Data variables:
valid_time (time, step) datetime64[ns] ...
t2m (n, time, step, heightAboveGround, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 2
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2021-04-15T12:08:59 GRIB to CDM+CF via cfgrib-0....
What happened:
Output of the MCVE is :
Coords written in copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Coords written in copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude']
Coords written in copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
What you expected to happen
Output of the MCVE should be :
Coords written in copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Coords written in copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Coords written in copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Minimal Complete Verifiable Example:
import xarray as xr
import numpy as np
xr.show_versions()
print(xr.__file__)
FILE1 = 'copy.1.nc'
FILE2 = 'copy.2.nc'
FILE3 = 'copy.3.nc'
FILE4 = 'copy.4.nc'
ds = xr.open_dataset('source.nc')
# Initial dataset is ok
print(f'Coords written in {FILE1}: {list(ds.coords)}')
ds.to_netcdf(FILE1)
print(f'Reread and check {FILE1}: {list(xr.open_dataset(FILE1).coords)}')
#print(xr.open_dataset(FILE1))
print()
# No round trip
ds = ds.rename({'number': 'n'})
ds = ds.set_coords('valid_time')
print(f'Coords written in {FILE2}: {list(ds.coords)}')
ds.to_netcdf(FILE2)
print(f'Reread and check {FILE2}: {list(xr.open_dataset(FILE2).coords)}')
#print(xr.open_dataset(FILE2))
print()
# Doing a round trip solves the issue :
ds = xr.open_dataset('source.nc')
ds = ds.rename({'number': 'n'})
ds.to_netcdf(FILE3)
ds = xr.open_dataset(FILE3)
ds = ds.set_coords('valid_time')
print(f'Coords written in {FILE4}: {list(ds.coords)}')
ds.to_netcdf(FILE4)
print(f'Reread and check {FILE4}: {list(xr.open_dataset(FILE4).coords)}')
#print(xr.open_dataset(FILE4))
Anything else we need to know?:
The initial source.nc file is available at wget https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/dev/xarray-issue/source.nc
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.15.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4
xarray: 0.17.0
pandas: 1.2.2
numpy: 1.19.5
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.5
iris: None
bottleneck: None
dask: 2021.02.0
distributed: 2021.02.0
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: 4.9.2
pytest: 5.3.1
IPython: 7.20.0
sphinx: 3.4.3