-
-
Notifications
You must be signed in to change notification settings - Fork 366
handle dataframe-level failure cases: convert row to dict #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: cosmicBboy <[email protected]>
deepyaman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Minor question:
schema_context column check check_number failure_case index
0 DataFrameSchema failure_case custom_check 0 {'a': 0, 'b': 1} 0
1 DataFrameSchema failure_case custom_check 0 {'a': 1, 'b': 0} 1
2 DataFrameSchema failure_case custom_check 0 {'a': 1, 'b': 1} 2
3 DataFrameSchema failure_case <lambda> 1 {'a': 0, 'b': 1} 0
4 DataFrameSchema failure_case <lambda> 1 {'a': 1, 'b': 0} 1
5 DataFrameSchema failure_case <lambda> 1 {'a': 1, 'b': 1} 2What does index mean here? Is it basically like an index for the result, or do people expect to be able to use it to access data in the original dataframe? (Which wouldn't make sense for Ibis)
A related question: are you concerned that the failure cases dataframe won't fit into memory? What is the risk of this?
Do any of the other backends have some capability for limiting output here? It sounds useful even if for pandas or Polars, if you have 1000 failures cases it wouldn't show them all. If so, could .limit() the result from Ibis in the same place.
Good call out! this is just the pandas index created from calling |
|
re: limiting failure cases, Checks already has an Line 35 in 2a50548
|
I guess it makes sense to remove it. |
I think that would be fine. But probably more importantly, given the option is there, I don't see it implemented on Ibis (or Polars)? I'll have to check, but if it's not, I can try and put up a PR. |
Yeah, now that I revisit the code, I think this only touches column-level checks in the pandas API. Lemme take a stab implementing them across the 3 backends, I'll tap you if I have any q's about ibis. |
|
will merge this PR now. |
hey @deepyaman, here's a potential implementation to handle dataframe-level checks to prevent pivoting failure cases to long-form data, as discussed here: #2041 (comment)
It just operates on the failure cases converted into a pandas dataframe.
A related question: are you concerned that the failure cases dataframe won't fit into memory? What is the risk of this?