Skip to content

df.loc produces empty frame in case of (1, 1) shape frame #2253

@dchigarev

Description

@dchigarev

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Any
  • Modin version (modin.__version__): 0.8.1.1
  • Python version: 3.7.5
  • Code we can use to reproduce:
if __name__ == "__main__":
    import pandas
    import modin.pandas as pd

    data = {"a": [1]}

    pd_df = pandas.DataFrame(data)
    md_df = pd.DataFrame(data)

    pd_ser = pandas.Series([5], index=["a"], name=0)
    md_ser = pd.Series([5], index=["a"], name=0)

    pd_df.loc[0] = pd_ser
    md_df.loc[0] = md_ser

    print(f"Pandas result:\n{pd_df}")
    print(f"\nModin result:\n{md_df}")
Output
Pandas result:
   a
0  5

Modin result:
Empty DataFrame
Columns: [a]
Index: []

Describe the problem

The problem is in how we're determining axis to assign new item in loc/iloc indexers

# This is True when we dealing with assignment of a full column. This case
# should be handled in a fastpath with `df[col] = item`.
if (
len(row_lookup) == len(self.qc.index)
and len(col_lookup) == 1
and hasattr(self.df, "columns")
):
self.df[self.df.columns[col_lookup][0]] = item
# This is True when we are assigning to a full row. We want to reuse the setitem
# mechanism to operate along only one axis for performance reasons.
elif len(col_lookup) == len(self.qc.columns) and len(row_lookup) == 1:
if hasattr(item, "_query_compiler"):
item = item._query_compiler
new_qc = self.qc.setitem(1, self.qc.index[row_lookup[0]], item)
self.df._create_or_update_from_compiler(new_qc, inplace=True)
# Assignment to both axes.

The thing is, that row and column lookups that we'll get in case of assignment like in the reproducer above satisfies both types of assignment:

df.shape == (1, 1)
df.loc["a"] -> row_lookup = [0]
               col_lookup = [0]

len(row_lookup) == len(df.index) and len(col_lookup) == 1: satisfies axis=0 assignment condition
len(col_lookup) == len(df.columns) and len(row_lookup) == 1: satisfies axis=1 assignment condition

So in half of cases we're picking the wrong if-else branch

Metadata

Metadata

Assignees

Labels

P1Important tasks that we should complete soonbug 🦗Something isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions