-
Notifications
You must be signed in to change notification settings - Fork 666
Closed
Labels
bug 🦗Something isn't workingSomething isn't working
Description
Reproducer:
import modin.pandas as pd
import numpy as np
nrows = 256
ncols = 128
data = {
f"col{i}": np.random.randint(0, 100, nrows)
for i in np.arange(ncols)
}
agg_fn = {"max": ("col1", np.max), "min": ("col127", np.min)}
df = pd.DataFrame(data)
res = df.groupby("col0").agg(**agg_fn) # KeyError: 'col127' does not exist
print(res)
Describe the problem
That happens because In the current implementation we assume, that every column from dict function exists in every partition, however that's not true. We probably should check that partition on what we're applying on, contains all columns from dict, otherwise drop them from dict
Metadata
Metadata
Assignees
Labels
bug 🦗Something isn't workingSomething isn't working