Skip to content

Multi-dimensional coordinate mixup when writing to netCDF #1763

Closed
@neishm

Description

@neishm

Problem description

Under certain conditions, the netCDF files produced by Dataset.to_netcdf() have the wrong coordinates attributed to the variables. This seems to happen if there are multiple multi-dimensional coordinates, which share some (but not all) dimensions.

Test Dataset

Some sample code to generate a problematic Dataset:

import xarray as xr
import numpy as np

zeros1 = np.zeros((5,3))
zeros2 = np.zeros((6,3))
zeros3 = np.zeros((5,4))
d = xr.Dataset({
    'lon1': (['x1','y1'], zeros1, {}),
    'lon2': (['x2','y1'], zeros2, {}),
    'lon3': (['x1','y2'], zeros3, {}),
    'lat1': (['x1','y1'], zeros1, {}),
    'lat2': (['x2','y1'], zeros2, {}),
    'lat3': (['x1','y2'], zeros3, {}),
    'foo1': (['x1','y1'], zeros1, {'coordinates': 'lon1 lat1'}),
    'foo2': (['x2','y1'], zeros2, {'coordinates': 'lon2 lat2'}),
    'foo3': (['x1','y2'], zeros3, {'coordinates': 'lon3 lat3'}),
})
d = xr.conventions.decode_cf(d)

Here, the coordinates lat1,lat2,lat3 (and lon1,lon2,lon3) share one dimension with each other. The Dataset itself gets created properly:

>>> print(d)

<xarray.Dataset>
Dimensions:  (x1: 5, x2: 6, y1: 3, y2: 4)
Coordinates:
    lat1     (x1, y1) float64 ...
    lat3     (x1, y2) float64 ...
    lat2     (x2, y1) float64 ...
    lon1     (x1, y1) float64 ...
    lon3     (x1, y2) float64 ...
    lon2     (x2, y1) float64 ...
Dimensions without coordinates: x1, x2, y1, y2
Data variables:
    foo1     (x1, y1) float64 ...
    foo2     (x2, y1) float64 ...
    foo3     (x1, y2) float64 ...

and each DataArray does have the right coordinates associated with them:

>>> print (d.foo1)

<xarray.DataArray 'foo1' (x1: 5, y1: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
    lat1     (x1, y1) float64 ...
    lon1     (x1, y1) float64 ...
Dimensions without coordinates: x1, y1

>>> print (d.foo2)

<xarray.DataArray 'foo2' (x2: 6, y1: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
    lat2     (x2, y1) float64 ...
    lon2     (x2, y1) float64 ...
Dimensions without coordinates: x2, y1

>>> print (d.foo3)

<xarray.DataArray 'foo3' (x1: 5, y2: 4)>
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
Coordinates:
    lat3     (x1, y2) float64 ...
    lon3     (x1, y2) float64 ...
Dimensions without coordinates: x1, y2

The problem

The problem happens when I try to write this to netCDF (using either the netCDF4 or scipy engines):

d.to_netcdf("test.nc")

The resulting file has extra coordinates on the variables:

~$ ncdump -h test.nc
netcdf test {
dimensions:
	x1 = 5 ;
	y1 = 3 ;
	y2 = 4 ;
	x2 = 6 ;
variables:
	double lat1(x1, y1) ;
		lat1:_FillValue = NaN ;
	double lat3(x1, y2) ;
		lat3:_FillValue = NaN ;
	double lat2(x2, y1) ;
		lat2:_FillValue = NaN ;
	double lon1(x1, y1) ;
		lon1:_FillValue = NaN ;
	double lon3(x1, y2) ;
		lon3:_FillValue = NaN ;
	double lon2(x2, y1) ;
		lon2:_FillValue = NaN ;
	double foo1(x1, y1) ;
		foo1:_FillValue = NaN ;
		foo1:coordinates = "lat1 lat3 lat2 lon1 lon3 lon2" ;
	double foo2(x2, y1) ;
		foo2:_FillValue = NaN ;
		foo2:coordinates = "lon1 lon2 lat1 lat2" ;
	double foo3(x1, y2) ;
		foo3:_FillValue = NaN ;
		foo3:coordinates = "lon1 lon3 lat1 lat3" ;

// global attributes:
		:_NCProperties = "version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18" ;
}

Here, foo1, foo2, and foo3 have extra coordinates associated with them. Interestingly, if I re-open this netCDF file with xarray.open_dataset, I get the correct coordinates back for each DataArray. However, other netCDF utilities may not be so forgiving.

Expected Output

I would expect the netCDF file to have a single pair of lat/lon for each variable:

        ...
	double foo1(x1, y1) ;
		foo1:_FillValue = NaN ;
		foo1:coordinates = "lat1 lon1" ;
	double foo2(x2, y1) ;
		foo2:_FillValue = NaN ;
		foo2:coordinates = "lon2 lat2" ;
	double foo3(x1, y2) ;
		foo3:_FillValue = NaN ;
		foo3:coordinates = "lon3 lat3" ;
        ...
}

Output of xr.show_versions()

INSTALLED VERSIONS

commit: c2b205f
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-101-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
xarray: 0.10.0-5-gc2b205f
pandas: 0.21.0
numpy: 1.13.3
scipy: None
netCDF4: 1.3.1
h5netcdf: None
Nio: None
bottleneck: None
cyordereddict: None
dask: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 38.2.4
pip: 9.0.1
conda: None
pytest: 3.3.1
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions