Skip to content

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 11, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

fast_unique_multiple was never used with more than 2 arrays, so no need to keep the implementation around. Returning indices from the cython level allows us to operate on the initial object and hence keeping the dtypes.

Got us a nice performance boost on top of it:

     [fe9e5d02]       [e690752b]
     <midx_union_no_na~4>       <midx_union_no_na>
-      59.6±0.3ms       53.0±0.8ms     0.89  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'union')
-      44.2±0.2ms       25.7±0.2ms     0.58  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'union')
-        44.5±1ms       25.7±0.5ms     0.58  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'union')
-        115±10ms       49.1±0.5ms     0.43  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union')
-       114±0.3ms       48.0±0.7ms     0.42  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union')

cc @jorisvandenbossche

As a follow up we could improve the cython implementation to handle duplicates in right too

@phofl phofl added Performance Memory or execution speed performance MultiIndex NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 11, 2022
# Conflicts:
#	doc/source/whatsnew/v1.6.0.rst
#	pandas/core/indexes/multi.py
@phofl phofl merged commit aea824f into pandas-dev:main Sep 13, 2022
@phofl phofl added this to the 1.6 milestone Sep 13, 2022
@phofl phofl deleted the midx_union_non_na branch September 13, 2022 17:24
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex NA - MaskedArrays Related to pd.NA and nullable extension arrays Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants