Skip to content

Flox seems much slower in some cases? #281

Closed
@max-sixty

Description

@max-sixty

Not sure if this belongs in xarray or here. And it's probably caused by me doing something wrong (or have messed up something in numbagg!).

It's motivated by a larger example that runs on dask and takes 128s vs. 8s.

Here's a smaller MCVE:

ds = xr.tutorial.load_dataset('air_temperature')


[ins] In [1]: xr.set_options(use_flox=True)
Out[1]: <xarray.core.options.set_options at 0x15022dbb0>

[nav] In [2]: %timeit -n 1 -r 1 ds.groupby('lon').count(...)
723 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

[ins] In [3]: xr.set_options(use_flox=False)
Out[3]: <xarray.core.options.set_options at 0x15022d8e0>

[ins] In [4]: %timeit -n 1 -r 1 ds.groupby('lon').count(...)
27.7 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

INSTALLED VERSIONS

commit: 37325848799e75d7b1a73a0e45e2cf4bd376ccaa
python: 3.9.18 (main, Aug 24 2023, 21:19:58)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None

xarray: 2023.9.1.dev10+gff14c6ba.d20230930
pandas: 2.1.1
numpy: 1.26.1
scipy: 1.11.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.16.0
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: 1.3.7
dask: 2023.5.0
distributed: 2023.5.0
matplotlib: 3.6.0
cartopy: None
seaborn: 0.12.2
numbagg: 0.6.0
fsspec: 2022.8.2
cupy: None
pint: 0.22
sparse: 0.14.0
flox: 0.8.1
numpy_groupies: 0.9.22
setuptools: 68.1.2
pip: 23.2.1
conda: None
pytest: 7.4.0
mypy: 1.6.1
IPython: 8.14.0
sphinx: 5.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions