Skip to content

Dataset/DataArray to_dataframe() dimensions order mismatch. #2346

Closed
@Thomas-Z

Description

@Thomas-Z

Code Sample

import xarray as xr
import numpy as np

data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('y', 'x'))
ds = xr.Dataset({'foo': data})

# Applied on the Dataset
ds.to_dataframe()

#          foo
#x y          
#a 0  0.348519
#  1 -0.322634
#  2 -0.683181
#b 0  0.197501
#  1  0.504810
#  2 -1.871626

# Applied to the DataArray
ds['foo'].to_dataframe()

#          foo
#y x          
#0 a  0.348519
#  b  0.197501
#1 a -0.322634
#  b  0.504810
#2 a -0.683181
#  b -1.871626

Problem description

The to_dataframe method applied to a DataArray will respect the dimensions order whereas the same method applied to a Dataset will use an alphabetically sorted order.

In both situation to_dataframe calls _to_dataframe() with an argument.
The DataArray uses an OrderedDict but the Dataset uses self.dims (which is a SortedKeyDict) as argument.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 xarray: 0.10.8 pandas: 0.23.4 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.0 h5netcdf: None h5py: 2.8.0 Nio: None zarr: 2.2.0 bottleneck: None cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: None setuptools: 40.0.0 pip: 18.0 conda: None pytest: 3.7.1 IPython: 6.5.0 sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions