handle dataframe-level failure cases: convert row to dict #2050

cosmicBboy · 2025-06-25T04:19:52Z

hey @deepyaman, here's a potential implementation to handle dataframe-level checks to prevent pivoting failure cases to long-form data, as discussed here: #2041 (comment)

It just operates on the failure cases converted into a pandas dataframe.

A related question: are you concerned that the failure cases dataframe won't fit into memory? What is the risk of this?

Signed-off-by: cosmicBboy <[email protected]>

deepyaman

LGTM

Minor question:

    schema_context        column         check  check_number      failure_case  index
0  DataFrameSchema  failure_case  custom_check             0  {'a': 0, 'b': 1}      0
1  DataFrameSchema  failure_case  custom_check             0  {'a': 1, 'b': 0}      1
2  DataFrameSchema  failure_case  custom_check             0  {'a': 1, 'b': 1}      2
3  DataFrameSchema  failure_case      <lambda>             1  {'a': 0, 'b': 1}      0
4  DataFrameSchema  failure_case      <lambda>             1  {'a': 1, 'b': 0}      1
5  DataFrameSchema  failure_case      <lambda>             1  {'a': 1, 'b': 1}      2

What does index mean here? Is it basically like an index for the result, or do people expect to be able to use it to access data in the original dataframe? (Which wouldn't make sense for Ibis)

A related question: are you concerned that the failure cases dataframe won't fit into memory? What is the risk of this?

Do any of the other backends have some capability for limiting output here? It sounds useful even if for pandas or Polars, if you have 1000 failures cases it wouldn't show them all. If so, could .limit() the result from Ibis in the same place.

cosmicBboy · 2025-06-25T15:01:17Z

What does index mean here? Is it basically like an index for the result, or do people expect to be able to use it to access data in the original dataframe? (Which wouldn't make sense for Ibis)

Good call out! this is just the pandas index created from calling to_pandas. If it's confusing we can remove it

cosmicBboy · 2025-06-25T15:05:08Z

re: limiting failure cases, Checks already has an n_failure_cases parameter:

pandera/pandera/api/checks.py

Line 35 in 2a50548

n_failure_cases: Optional[int] = None,

, which defaults to none. Maybe this can be configured at a global level or backend level so that we have some reasonable default (1000?)

deepyaman · 2025-06-25T15:32:56Z

What does index mean here? Is it basically like an index for the result, or do people expect to be able to use it to access data in the original dataframe? (Which wouldn't make sense for Ibis)

Good call out! this is just the pandas index created from calling to_pandas. If it's confusing we can remove it

I guess it makes sense to remove it.

deepyaman · 2025-06-25T15:35:00Z

re: limiting failure cases, Checks already has an n_failure_cases parameter:

pandera/pandera/api/checks.py

Line 35 in 2a50548

n_failure_cases: Optional[int] = None,

, which defaults to none. Maybe this can be configured at a global level or backend level so that we have some reasonable default (1000?)

I think that would be fine. But probably more importantly, given the option is there, I don't see it implemented on Ibis (or Polars)? I'll have to check, but if it's not, I can try and put up a PR.

cosmicBboy · 2025-06-25T16:29:50Z

I don't see it implemented on Ibis (or Polars)

Yeah, now that I revisit the code, I think this only touches column-level checks in the pandas API. Lemme take a stab implementing them across the 3 backends, I'll tap you if I have any q's about ibis.

cosmicBboy · 2025-06-25T16:30:08Z

will merge this PR now.

handle dataframe-level failure cases: convert row to dict

02d6a3d

Signed-off-by: cosmicBboy <[email protected]>

cosmicBboy requested a review from deepyaman June 25, 2025 13:48

deepyaman approved these changes Jun 25, 2025

View reviewed changes

cosmicBboy merged commit 7a81965 into main Jun 25, 2025
200 checks passed

cosmicBboy deleted the nielsb/ibis-failure-cases branch June 26, 2025 00:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

handle dataframe-level failure cases: convert row to dict #2050

handle dataframe-level failure cases: convert row to dict #2050

Uh oh!

cosmicBboy commented Jun 25, 2025 •

edited

Loading

Uh oh!

deepyaman left a comment

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

deepyaman commented Jun 25, 2025

Uh oh!

deepyaman commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

handle dataframe-level failure cases: convert row to dict #2050

handle dataframe-level failure cases: convert row to dict #2050

Uh oh!

Conversation

cosmicBboy commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepyaman left a comment

Choose a reason for hiding this comment

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

deepyaman commented Jun 25, 2025

Uh oh!

deepyaman commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

cosmicBboy commented Jun 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cosmicBboy commented Jun 25, 2025 •

edited

Loading