You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to find a workaround for the behavior of drop_invalid_rows=True when passing a dtype to a mixed string-int column.
import pandera as pa
from pandera import Column, DataFrameSchema, Check, Index
df = pd.DataFrame({"counter": ["a", "2", "3"]})
schema = DataFrameSchema(
{
"counter": Column(
int,
checks=None,
coerce=True
)
},
drop_invalid_rows=True
)
df_val = schema.validate(df, lazy=True)
Setting drop_invalid_rows at a DataFrame Schema level, the validated 'df_val' returns type 'object' and drops the row with "a" value.
Whereas, setting drop_invalid_rows=True at a Column level returns schema error for "a" value as a failure case.
import pandera as pa
from pandera import Column, DataFrameSchema, Check, Index
df = pd.DataFrame({"counter": ["a", "2", "3"]})
schema = DataFrameSchema(
{
"counter": Column(
int,
checks=None,
coerce=True,
drop_invalid_rows=True
)
},
)
df_val = schema.validate(df, lazy=True)
---
raise SchemaErrors(
pandera.errors.SchemaErrors: Schema None: A total of 1 schema errors were found.
Error Counts
------------
- SchemaErrorReason.SCHEMA_COMPONENT_CHECK: 1
Schema Error Summary
--------------------
failure_cases n_failure_cases
schema_context column check
Column counter coerce_dtype('int64') [a] 1
How come there is different output behavior for these similar schemas?
The first case makes sense. However is there a workaround to coerce to integer type instead of getting object type, with the invalid row being dropped?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I am trying to find a workaround for the behavior of drop_invalid_rows=True when passing a dtype to a mixed string-int column.
Setting drop_invalid_rows at a DataFrame Schema level, the validated 'df_val' returns type 'object' and drops the row with "a" value.
Whereas, setting drop_invalid_rows=True at a Column level returns schema error for "a" value as a failure case.
How come there is different output behavior for these similar schemas?
The first case makes sense. However is there a workaround to coerce to integer type instead of getting object type, with the invalid row being dropped?
Beta Was this translation helpful? Give feedback.
All reactions