Skip to content

Improve performance with numpy_groupies #222

Open
@dcherian

Description

@dcherian

IMO our main bottleneck now is how numpy_groupies converts nD problems to a 1D problem before using bincount, ufunc.at etc (ml31415/numpy-groupies#46). (e.g. grouping an nD array by a 1D array time.month and reducing along 1D time).

I tried to fix this but it had to be reverted because it doesn't generalize for axis != -1.

  1. We could just use it in numpy-groupies when axis == -1 and use the standard path for other cases. This would be good I think. (see Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77)
  2. flox still has the problem that for reductions like mean we compute 2 reductions for dask arrays: sum and count. This means we incur the cost twice. To avoid this numpy-groupies would have to support multiple reductions (which they don't want to); or we make the transformation to a 1D problem ourselves. This is annoying but doable.

PS: We could totally avoid all this but building out numbagg's groupby which IIRC is stuck on implementing a proper fill_value that is not the identity element for reductions.

cc @Illviljan @TomNicholas

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions