Skip to content

Performance regression of reduction operations #2311

@YarShev

Description

@YarShev

Describe the problem

After this commit we got performance regression for several operations. An example for mean is below:

Source code / logs

import numpy as np
NCOLS = 2 ** 10
NROWS = 2 ** 10
RAND_LOW = 0
RAND_HIGH = 100
random_state = np.random.RandomState(seed=42)
from timeit import default_timer as timer
def measure(df):
     start = timer()
     df.mean(axis=0)
     end = timer()
     print(end - start)
float_nan_data = {
      "col{}".format(int((i - NCOLS / 2) % NCOLS + 1)): [
          x if (j % 4 == 0 and i > NCOLS // 2) or (j != i and i <= NCOLS // 2) else np.NaN
          for j, x in enumerate(random_state.uniform(RAND_LOW, RAND_HIGH, size=(NROWS)))
      ]
       for i in range(NCOLS)
  }
import modin.pandas as pd
df = pd.DataFrame(float_nan_data)
measure(df)
before the commit
0.6 s
after the commit
1.4 s

Metadata

Metadata

Assignees

Labels

Performance 🚀Performance related issues and pull requests.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions