Skip to content

Add NetCDF3 dtype coercion for unsigned integer types #4014

Closed
@blsqr

Description

@blsqr

xr.Dataset.to_netcdf does not seem to support writing data with unsigned integer dtypes, uint32, uint64 etc.. This seems to be the case for both scipy-based output formats, NETCDF3_CLASSIC and NETCDF3_64BIT.

It seems like dtype coercions for int64 and bool are done automatically for NetCDF3 in the xarray.netcdf3 module.
Shouldn't data of signed dtypes then also be coerced, i.e. to their signed equivalent?

MCVE Code Sample

import numpy as np
import xarray as xr

da = xr.DataArray(np.array([1,2,3], dtype='uint64'))

# The following all fail:
da.to_netcdf("foo")  # default format: scipy NETCDF3_CLASSIC
da.to_netcdf("bar", format='NETCDF3_64BIT')
da.astype('uint32').to_netcdf("baz")
da.astype('uint16').to_netcdf("spam")

# This works:
da.astype('int64').to_netcdf("working64")  # is coerced
da.astype('int32').to_netcdf("working32")  # works natively

Importantly, this is with the netcdf4 python package not being installed, in which case that package would be used for writing rather than scipy's netcdf.

Expected Output

NetCDF3 file is written with an appropriately coerced data format, e.g. as done with int64.

Alternatively, writing data fails for all dtypes that would natively be unsupported, including int64 and bool.

Problem Description

Given that the infrastructure for coercion is already in place, it seems more consistent to me to apply coercion to all cases where it would lead to to_netcdf method calls succeeding rather than failing, not only to int64 and bool.

Ideally, coercion would happen towards another unsigned integer type.
However, writing uint32 seems not to be possible, so it's not a 64bit/32bit issue.
While the NetCDF Format Specification declares as only unsigned integer type NON_NEG, which I presume to be equivalent to uint32, writing unsigned integers seems not possible via scipy's NetCDF3 writer. Thus, the only viable coercion, as far as I see, would be to signed equivalents.

I see no new cast safety implications here, because the existing coerce_nc3_dtype function already checks if the original and cast arrays compare equivalently.

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 (default, Dec 30 2019, 19:38:26)
[Clang 11.0.0 (clang-1100.0.33.16)]
python-bits: 64
OS: Darwin
OS-release: 19.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.15.1
pandas: 1.0.1
numpy: 1.18.3
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0
pip: 20.0.2
conda: None
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions