Skip to content

Bug in legend of dataset.plot.scatter #4126

@yohai

Description

@yohai

When using Dataset.scatter with hue being a variable of dtype string, the legend turns out to be wrong.

MCVE Code Sample

import xarray as xr
import numpy as np
dd = xr.Dataset({'y': (['x'], np.arange(8)),
                 'label': (['x'], list('AABBCCDD'))},
                coords={'x': np.linspace(0,1,8)})
dd.plot.scatter(x='x', y='y', hue='label')

Output is (note the legend):
Figure_1

Playing around it seems that it always chooses the first 4 values as the legend labels (note that the order of colors of the points is correct):

import xarray as xr
import numpy as np
dd = xr.Dataset({'y': (['x'], np.arange(8)),
                 'label': (['x'], list('ABBACDDC'))},
                coords={'x': np.linspace(0,1,8)})
dd.plot.scatter(x='x', y='y', hue='label')

Figure_2

And if there are only 3 labels in total it chooses the first 3:

import xarray as xr
import numpy as np
dd = xr.Dataset({'y': (['x'], np.arange(6)),
                 'label': (['x'], list('ABBACC'))},
                coords={'x': np.linspace(0,1,6)})
dd.plot.scatter(x='x', y='y', hue='label')

Figure_1

Expected Output

Legend in first two plots should read 'ABCD' and last plot 'ABC'

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.15.1
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.8.1
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions