Description
What happened?
The DataTree html repr is very slow.
What did you expect to happen?
Calling dt
not taking longer than a second.
Minimal Complete Verifiable Example
import numpy as np
import xarray as xr
number_of_files = 700
number_of_groups = 5
number_of_variables = 10
datasets = {}
for f in range(number_of_files):
for g in range(number_of_groups):
# Create random data
time = np.linspace(0, 50 + f, 1 + 1000 * g)
y = f * time + g
# Create dataset:
ds = xr.Dataset(
data_vars={
f"temperature_{g}{i}": ("time", y)
for i in range(number_of_variables // number_of_groups)
},
coords={"time": ("time", time)},
).chunk()
# Prepare for xr.DataTree:
name = f"file_{f}/group_{g}"
datasets[name] = ds
dt = xr.DataTree.from_dict(datasets)
%timeit dt._repr_html_()
# 37.4 s ± 5.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit dt.__repr__()
2.58 s ± 182 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
Decent workaround is print(dt)
or dt.__repr__()
, but is noticably harder to type.
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:04:44) [MSC v.1940 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.7.1.dev363+g99426cbb.d20240904
pandas: 2.2.2
numpy: 2.2.1
scipy: 1.14.1
netCDF4: 1.7.1
pydap: 3.5
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.4.0
dask: 2024.11.2
distributed: 2024.11.2
matplotlib: 3.9.2
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: None
flox: 0.9.10
numpy_groupies: 0.11.2
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: 8.3.2
mypy: 1.14.1
IPython: 8.27.0
sphinx: 8.0.2