Skip to content

to_netcdf is not idempotent when stacking rename and set_coords #5170

Closed
@floriankrb

Description

@floriankrb

After doing

# !wget https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/dev/xarray-issue/source.nc
ds = xr.open_dataset('source.nc') 
ds = ds.rename({'number': 'n'})
ds = ds.set_coords('valid_time')
print(ds)
ds.to_netcdf('out.nc')
ds = xr.open_dataset('out.nc')
print(ds)

'valid_time' is not a coordinate in the netcdf file coordinate is turned into a variable.

<xarray.Dataset>
Dimensions:            (heightAboveGround: 1, latitude: 121, longitude: 240, n: 2, step: 3, time: 1)
Coordinates:
  * n                  (n) int64 0 1
  * time               (time) datetime64[ns] 2020-01-02
  * step               (step) timedelta64[ns] 1 days 2 days 3 days
  * heightAboveGround  (heightAboveGround) int64 2
  * latitude           (latitude) float64 90.0 88.5 87.0 ... -87.0 -88.5 -90.0
  * longitude          (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
    valid_time         (time, step) datetime64[ns] ...
Data variables:
    t2m                (n, time, step, heightAboveGround, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2021-04-15T12:08:59 GRIB to CDM+CF via cfgrib-0....
<xarray.Dataset>
Dimensions:            (heightAboveGround: 1, latitude: 121, longitude: 240, n: 2, step: 3, time: 1)
Coordinates:
  * n                  (n) int64 0 1
  * time               (time) datetime64[ns] 2020-01-02
  * step               (step) timedelta64[ns] 1 days 2 days 3 days
  * heightAboveGround  (heightAboveGround) int64 2
  * latitude           (latitude) float64 90.0 88.5 87.0 ... -87.0 -88.5 -90.0
  * longitude          (longitude) float64 0.0 1.5 3.0 4.5 ... 355.5 357.0 358.5
Data variables:
    valid_time         (time, step) datetime64[ns] ...
    t2m                (n, time, step, heightAboveGround, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2021-04-15T12:08:59 GRIB to CDM+CF via cfgrib-0....

What happened:
Output of the MCVE is :
Coords written in copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']

Coords written in copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude']

Coords written in copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']

What you expected to happen
Output of the MCVE should be :

Coords written in copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.1.nc: ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']

Coords written in copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.2.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']

Coords written in copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']
Reread and check copy.4.nc: ['n', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude', 'valid_time']

Minimal Complete Verifiable Example:


import xarray as xr
import numpy as np

xr.show_versions()
print(xr.__file__)

FILE1 = 'copy.1.nc'
FILE2 = 'copy.2.nc'
FILE3 = 'copy.3.nc'
FILE4 = 'copy.4.nc'

ds = xr.open_dataset('source.nc')

# Initial dataset is ok
print(f'Coords  written in {FILE1}: {list(ds.coords)}')
ds.to_netcdf(FILE1)
print(f'Reread and check   {FILE1}: {list(xr.open_dataset(FILE1).coords)}')
#print(xr.open_dataset(FILE1))

print()

# No round trip
ds = ds.rename({'number': 'n'})
ds = ds.set_coords('valid_time')
print(f'Coords  written in {FILE2}: {list(ds.coords)}')
ds.to_netcdf(FILE2)
print(f'Reread and check   {FILE2}: {list(xr.open_dataset(FILE2).coords)}')
#print(xr.open_dataset(FILE2))

print()

# Doing a round trip solves the issue :
ds = xr.open_dataset('source.nc')
ds = ds.rename({'number': 'n'})
ds.to_netcdf(FILE3)
ds = xr.open_dataset(FILE3)
ds = ds.set_coords('valid_time')
print(f'Coords  written in {FILE4}: {list(ds.coords)}')
ds.to_netcdf(FILE4)
print(f'Reread and check   {FILE4}: {list(xr.open_dataset(FILE4).coords)}')
#print(xr.open_dataset(FILE4))

Anything else we need to know?:
The initial source.nc file is available at wget https://storage.ecmwf.europeanweather.cloud/s2s-ai-challenge/dev/xarray-issue/source.nc

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.15.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.17.0
pandas: 1.2.2
numpy: 1.19.5
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.6.1
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.5
iris: None
bottleneck: None
dask: 2021.02.0
distributed: 2021.02.0
matplotlib: 3.3.4
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.0.1
conda: 4.9.2
pytest: 5.3.1
IPython: 7.20.0
sphinx: 3.4.3

It may be related/duplicate of #4512 and linked to #4108

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions