Conversation
|
Oh my branch was way behind, let me rebase... |
|
One catch is that |
No idea at all! I tested this very lightly, and it might not be worth it. In particular I'm not sure this was really the slow part in the larger datasets I was processing, maybe there was something else going on. Sometimes NNDescent takes longer than I would expect for mysterious reasons. I can look into it more in a bit, no reason to rush it in without more investigation. |
|
It looks promising -- in that it is an easy change that could have benefits. Let's just leave it pending for now until you've dug a little deeper. |
|
It definitely seems like there's considerable overhead when calling with a single query. It's hard to measure because it depends on the parameters and the data, but it's consistent. Of course, with a batch query this can be faster. So, maybe this isn't a good idea. I'm not sure why I thought this was a bottleneck, I would probably need to run a huge array through to see any effect. It would be nice if there was some way to dispatch based on array shape, but I don't think that's possible. |
|
One possibility which I haven't quite figured out but seems promising: use But I need to do more debugging and testing, as I'm not familiar with writing these signatures and I think I'm doing something wrong with inplace modification. One question: the existing code returns the output as |
|
These errors are confusing, it seems like numba doesn't know what to do with its own ufunc objects. Maybe I need additional annotation somewhere. Apparently this is a long-standing issue, although it doesn't seem to be documented as far as I can tell. Functions made with |
|
I'm not sure what to make of the errors either. Thinking about this is might be easiest to do something along the lines of the following: def deheap_sort_base(heaps):
...
for i in numba.prange(indices.shape[0]):
...
deheap_sort_bulk = numba.njit(parallel=True)(deheap_sort_base)
deheap_sort_small = numba.njit(parallel=False)(deheap_sort_base)and then you can call |
I was going to do that but I think I have a slightly better option, which matches the design of edit: this solution is basically the same as what you suggest, except that the user decides whether to turn this on for queries. I'm not sure what makes for the cleanest code, if you have a preference I can change it. |
|
Yes, I like that option a lot. It makes it all pretty clean from the user control perspective. |
This is a very simple change: I added
parallel=Truetoutils.deheap_sortand I used anumba.prangein the top loop.Sometimes, when I'm watching CPU usage for NNDescent on very large arrays, I've seen it spend a fair amount of time in a single thread near the end of the operation. Looking at the code, I suspect it's just dealing with this final
deheap_sortcall.Because the function operates on each row independently, it's pretty simple to wrap it in a parallel loop and let numba figure it out. It seems to work but I haven't tested rigorously yet. Hoping travis can do that for me.
As an additional tweak, I added python 3.9 to the test matrix here, to see what happens.edit: obsolete given recent updates