-
Notifications
You must be signed in to change notification settings - Fork 666
Closed
Labels
Performance 🚀Performance related issues and pull requests.Performance related issues and pull requests.
Milestone
Description
Describe the problem
After this commit we got performance regression for several operations. An example for mean
is below:
Source code / logs
import numpy as np
NCOLS = 2 ** 10
NROWS = 2 ** 10
RAND_LOW = 0
RAND_HIGH = 100
random_state = np.random.RandomState(seed=42)
from timeit import default_timer as timer
def measure(df):
start = timer()
df.mean(axis=0)
end = timer()
print(end - start)
float_nan_data = {
"col{}".format(int((i - NCOLS / 2) % NCOLS + 1)): [
x if (j % 4 == 0 and i > NCOLS // 2) or (j != i and i <= NCOLS // 2) else np.NaN
for j, x in enumerate(random_state.uniform(RAND_LOW, RAND_HIGH, size=(NROWS)))
]
for i in range(NCOLS)
}
import modin.pandas as pd
df = pd.DataFrame(float_nan_data)
measure(df)
before the commit
0.6 s
after the commit
1.4 s
Metadata
Metadata
Assignees
Labels
Performance 🚀Performance related issues and pull requests.Performance related issues and pull requests.