Skip to content

Slow performance when open zarr file with numpy>2.0.0 #9545

Closed
@renaudjester

Description

@renaudjester

What happened?

Hi!

I want to open a zarr dataset lazily.
On my computer:
With numpy==1.26.4 it takes around 1.5sec
With numpy==2.1.1 it takes around 5sec

It's also slow on an ubuntu machine.

Unfortunately, I don't really have the time to deep dive into the issue and pinpoint exactly what is the piece of code that takes much more time than before. As little as I tested, it doesn't seem to come from the http calls.

What did you expect to happen?

I expect that the time to lazily open the dataset is the same whatever the numpy version.

Minimal Complete Verifiable Example

import xarray
import time

top = time.time()
dataset = xarray.open_dataset(
    "https://s3.waw3-1.cloudferro.com/mdl-arco-time-035/arco/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-cur-rean-h_202012/timeChunked.zarr",
    engine="zarr",
)
print(f"Took: {time.time() - top}s")
# with numpy==1.26.4: ~1s
# with numpy==2.1.1: ~5s

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

# numpy=2.1.1 INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 (main, Sep 23 2024, 17:37:36) [Clang 15.0.0 (clang-1500.3.9.4)] python-bits: 64 OS: Darwin OS-release: 23.6.0 machine: arm64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.1
scipy: None
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.9.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.0
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: None
None

# numpy==1.26.4 INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 (main, Sep 23 2024, 17:37:36) [Clang 15.0.0 (clang-1500.3.9.4)] python-bits: 64 OS: Darwin OS-release: 23.6.0 machine: arm64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.9.0
pandas: 2.2.3
numpy: 1.26.4
scipy: None
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.9.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.0
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: None
None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions