Skip to content

Performance issue with nulls in pandas dataframe with multi-index validation #1449

@mattB1989

Description

@mattB1989

Describe the bug
Very similar to #1403 but for pandas dataframe. If I define a column as nullable, pandera still check no matter what that there are nulls in the column (see this code), although the result of the check will never be used (as far as I can tell).

It is made worse when using a multi-index, because for the multi index gets cast to a column in the check - I am not sure why this behaviour is desired. The problematic code is here.

I believe the check should be skipped if the column is declared as nullable, and if it is not, why is the error formatter casting the multiindex to strings ? This can be done more efficiently, but really why not keep the index as is ?

  • [ x] I have checked that this issue has not already been reported.
  • [ x] I have confirmed this bug exists on the latest version of pandera.
  • [ x] (optional) I have confirmed this bug exists on the master branch of pandera.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions