-
-
Notifications
You must be signed in to change notification settings - Fork 366
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
It is not possible to define PySpark schemas with strict="filter" and coerce=True at the same time.
Using both flags results in "TypeError: schema arg must be a DataFrameSchema, found <class 'NoneType'>".
This happens because the check_obj.pandera.schema attribute is lost after dropping columns
https://github.com/unionai-oss/pandera/blob/main/pandera/backends/pyspark/container.py#L394
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandera.
- (optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample, a copy-pastable example
from pandera.api.base.schema import BaseSchema
from pandera.pyspark import DataFrameModel
from pyspark.sql import SparkSession
class Model(DataFrameModel):
class Config(BaseSchema):
strict = "filter"
coerce = True
x: str
spark = SparkSession.builder.getOrCreate()
Model.validate(spark.createDataFrame([(1, 2), (3, 4)], ["x", "y"]))Expected behavior
No error.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working