Open
Description
IMO our main bottleneck now is how numpy_groupies
converts nD problems to a 1D problem before using bincount
, ufunc.at
etc (ml31415/numpy-groupies#46). (e.g. grouping an nD array by a 1D array time.month
and reducing along 1D time
).
I tried to fix this but it had to be reverted because it doesn't generalize for axis != -1
.
We could just use it in(see Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77)numpy-groupies
whenaxis == -1
and use the standard path for other cases. This would be good I think.flox
still has the problem that for reductions likemean
we compute 2 reductions for dask arrays:sum
andcount
. This means we incur the cost twice. To avoid thisnumpy-groupies
would have to support multiple reductions (which they don't want to); or we make the transformation to a 1D problem ourselves. This is annoying but doable.
PS: We could totally avoid all this but building out numbagg
's groupby which IIRC is stuck on implementing a proper fill_value
that is not the identity element for reductions.