-
Notifications
You must be signed in to change notification settings - Fork 666
Closed
Labels
P1Important tasks that we should complete soonImportant tasks that we should complete soonbug 🦗Something isn't workingSomething isn't working
Milestone
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Any
- Modin version (
modin.__version__
): 0.8.1.1 - Python version: 3.7.5
- Code we can use to reproduce:
if __name__ == "__main__":
import pandas
import modin.pandas as pd
data = {"a": [1]}
pd_df = pandas.DataFrame(data)
md_df = pd.DataFrame(data)
pd_ser = pandas.Series([5], index=["a"], name=0)
md_ser = pd.Series([5], index=["a"], name=0)
pd_df.loc[0] = pd_ser
md_df.loc[0] = md_ser
print(f"Pandas result:\n{pd_df}")
print(f"\nModin result:\n{md_df}")
Output
Pandas result:
a
0 5
Modin result:
Empty DataFrame
Columns: [a]
Index: []
Describe the problem
The problem is in how we're determining axis to assign new item in loc/iloc
indexers
modin/modin/pandas/indexing.py
Lines 318 to 333 in a571e10
# This is True when we dealing with assignment of a full column. This case | |
# should be handled in a fastpath with `df[col] = item`. | |
if ( | |
len(row_lookup) == len(self.qc.index) | |
and len(col_lookup) == 1 | |
and hasattr(self.df, "columns") | |
): | |
self.df[self.df.columns[col_lookup][0]] = item | |
# This is True when we are assigning to a full row. We want to reuse the setitem | |
# mechanism to operate along only one axis for performance reasons. | |
elif len(col_lookup) == len(self.qc.columns) and len(row_lookup) == 1: | |
if hasattr(item, "_query_compiler"): | |
item = item._query_compiler | |
new_qc = self.qc.setitem(1, self.qc.index[row_lookup[0]], item) | |
self.df._create_or_update_from_compiler(new_qc, inplace=True) | |
# Assignment to both axes. |
The thing is, that row and column lookups that we'll get in case of assignment like in the reproducer above satisfies both types of assignment:
df.shape == (1, 1)
df.loc["a"] -> row_lookup = [0]
col_lookup = [0]
len(row_lookup) == len(df.index) and len(col_lookup) == 1: satisfies axis=0 assignment condition
len(col_lookup) == len(df.columns) and len(row_lookup) == 1: satisfies axis=1 assignment condition
So in half of cases we're picking the wrong if-else branch
Metadata
Metadata
Assignees
Labels
P1Important tasks that we should complete soonImportant tasks that we should complete soonbug 🦗Something isn't workingSomething isn't working