Skip to content

Commit f9b0af5

Browse files
AndrejIringcosmicBboydeepyamanCopilotJarek-Rolski
authored
bugfix: fix format_vectorized_error_message to properly format nested pyarrow failed cases (#2036)
* resolve 2035 Signed-off-by: Andrej Iring <[email protected]> * change astype to apply Signed-off-by: Andrej Iring <[email protected]> * bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation (#2028) * bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation Signed-off-by: Niels Bantilan <[email protected]> * fix tests Signed-off-by: Niels Bantilan <[email protected]> --------- Signed-off-by: Niels Bantilan <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Expect Python slice index errors after Python 3.10 (#2033) * Expect Python slice index errors with pandas again Seen in https://github.com/unionai-oss/pandera/actions/runs/15526426636/job/43706977542?pr=2030 Signed-off-by: Deepyaman Datta <[email protected]> * Expect Python slice index errors after Python 3.10 Signed-off-by: Deepyaman Datta <[email protected]> * Fix the version check to not include Python 3.10.x Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * resolve failing pyspark tests Signed-off-by: Andrej Iring <[email protected]> * add new test Signed-off-by: Andrej Iring <[email protected]> * format code using black Signed-off-by: Andrej Iring <[email protected]> * Add `DataFrameModel`, `DataFrameSchema` for `ibis` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Refactor Ibis `DataFrameSchema` to extend pandas's Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add code for basic `Column`, and stub more modules Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add Ibis's parsing, validation, and error backends Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement stub to validate schema component checks Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement validation for floating types Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Inherit getter, only override setter, for `.dtype` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * implement column, add basic unit tests for Ibis data types Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add `ibis` extra and regenerate requirements files Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Re-enable Python equivalents for `int` and `float` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Do not test Ibis below py3.9, add DuckDB for tests (#1773) * Restore accidentally-deleted use of "breakpoint()" Signed-off-by: Deepyaman Datta <[email protected]> * Swap `types-pkg_resources` with `types-setuptools` (#1779) * Swap `types-pkg_resources` with `types-setuptools` Signed-off-by: Deepyaman Datta <[email protected]> * Update the expected error message to name variable Signed-off-by: Deepyaman Datta <[email protected]> * Fix expected outputs for pandas-stubs 2.2.2.240807 Signed-off-by: Deepyaman Datta <[email protected]> * Update CI configuration to filter right pandas ver Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> * Do not test Ibis below py3.9, add DuckDB for tests Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Support specifying and validating optional columns (#1762) * Test model schema equivalency with optional dtypes Signed-off-by: Deepyaman Datta <[email protected]> * Support specifying and validating optional columns Signed-off-by: Deepyaman Datta <[email protected]> * Add DuckDB for testing and regenerate requirements Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Register string datatype and update existing tests (#1766) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add missing `type` field for the `Int32` data type (#1771) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement minimal `coerce` and corresponding tests (#1772) * Implement minimal `coerce` and corresponding tests Signed-off-by: Deepyaman Datta <[email protected]> * Do not exclude 3.9 in Ibis CI, and regenerate reqs Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Ibis check backend (#1831) * [wip] add minimal ibis check backend implementation Signed-off-by: cosmicBboy <[email protected]> * support scalar, column, and table check output types Signed-off-by: cosmicBboy <[email protected]> * support scalar, column, and table check output types Signed-off-by: cosmicBboy <[email protected]> * Ibis check backend suggestions (#1855) * Apply suggestions from code review Signed-off-by: Deepyaman Datta <[email protected]> * Fix row-order-dependent order by adding table cols Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> * fix lint Signed-off-by: cosmicBboy <[email protected]> * fix unit tests Signed-off-by: cosmicBboy <[email protected]> --------- Signed-off-by: cosmicBboy <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Remove Ibis backend's use of `scalar_failure_case` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Update signature of `run_schema_components_checks` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Fix violations in `pandera/backends//container.py` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Remove use of multimethod library and align Polars Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * fix backend registration Signed-off-by: Niels Bantilan <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement minimal built-in checks for Ibis backend (#1885) * Implement minimal built-in checks for Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> * Implement `Column` validation for the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> * Promote check object to table during preprocessing Signed-off-by: Deepyaman Datta <[email protected]> * Remove extraneous fixture for backend registration Signed-off-by: Deepyaman Datta <[email protected]> * Resolve lint (unused imports, undefined variables) Signed-off-by: Deepyaman Datta <[email protected]> * Partially standardize docstrings of builtin checks Signed-off-by: Deepyaman Datta <[email protected]> * Fix the `preprocess` docstrings copied from pandas Signed-off-by: Deepyaman Datta <[email protected]> * Format pandera/backends/ibis/checks.py using Black Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Update test to reflect scalar output since Ibis 10 (#1907) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Use `Union` type for `PysparkObject` to fix typing Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add Ibis with the DuckDB extra as a dev dependency Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Update/align noxfile.py and generated requirements Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Supress unused argument check for `data_container` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Include `polars` extra dep for testing Ibis dtypes Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Update check example to correctly process IbisData Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Execute validated schema to get the desired result Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement "ne" built-in check for the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement int/uint/float types, except for float16 Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement timestamp type, and test built-in checks Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Support built-in checks for interval-typed columns Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Blacken pandera/engines/ibis_engine.py module code Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `dt.Date` type, and test built-in checks Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `dt.Time` type, and test built-in checks Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `gt` and `ge` check for the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Standardize docstrings, don't say "data container" Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `lt` and `le` check for the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Fix "form" to "from", and align docstring for test Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Apply suggestion from Copilot to fix error message Co-authored-by: Copilot <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Support table-level checks, including for built-in Signed-off-by: Deepyaman Datta <[email protected]> Co-authored-by: cosmicBboy <[email protected]> Signed-off-by: cosmicBboy <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `is_in_range` check for the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `isin()` built-in check for Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Refactor duplicated code into `_across()` function Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `notin()` builtin check for Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement the `str_matches` built-in check on Ibis Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `str_contains` check on the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `str_startswith()` check on Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `str_endswith()` built-in check for Ibis Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement the `str_length` built-in check for Ibis Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement the unique_values_eq built-in Ibis check Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add entry for `ibis` to the backend support matrix Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Write detailed documentation on using Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Replace `ibis.expr.types` imports (where possible) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Implement `typing` module and decorators for Ibis Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Duplicate tests for Ibis decorators implementation Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Use `code-cell` over `testcode` to build Ibis docs Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Add Ibis docs to the supported libraries `toctree` Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Create element-wise checks automatically for users (#2043) * Create element-wise checks automatically for users Signed-off-by: Deepyaman Datta <[email protected]> * Mark unsupported scenario with element-wise checks Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Remove `to_format` methods that don't work on Ibis (#2044) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Execute dataframe-level checks on the Ibis backend (#2041) * Execute dataframe-level checks on the Ibis backend Signed-off-by: Deepyaman Datta <[email protected]> * Remove unnecessary, un-Pythonic array length check Signed-off-by: Deepyaman Datta <[email protected]> * Remove bit on separately running data-level checks Signed-off-by: Deepyaman Datta <[email protected]> --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * Mark unsupported scenario with element-wise checks (#2046) Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * handle dataframe-level failure cases: convert row to dict Signed-off-by: cosmicBboy <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * bugfix/1927 (#2019) * fix generating pydantic core schema for generic types Signed-off-by: Jarek-Rolski <[email protected]> * test pytest for typed generic DataFrame Signed-off-by: Jarek-Rolski <[email protected]> * fix issue with docs gitactions test Signed-off-by: Jarek-Rolski <[email protected]> * fix pytest test_typed_generic_dataframe Signed-off-by: Jarek-Rolski <[email protected]> * fix pytest test_typed_generic_dataframe Signed-off-by: Jarek-Rolski <[email protected]> * improve changes Signed-off-by: Jarek-Rolski <[email protected]> * bug fix Signed-off-by: Jarek-Rolski <[email protected]> * fix pydantic tests for pydantic<2 Signed-off-by: Jarek-Rolski <[email protected]> * add pytest test_typed_dataframe_model_json_schema Signed-off-by: Jarek-Rolski <[email protected]> --------- Signed-off-by: Jarek-Rolski <[email protected]> Signed-off-by: Andrej Iring <[email protected]> * resolve failing pylint tests Signed-off-by: Andrej Iring <[email protected]> --------- Signed-off-by: Andrej Iring <[email protected]> Signed-off-by: Niels Bantilan <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: cosmicBboy <[email protected]> Signed-off-by: Jarek-Rolski <[email protected]> Co-authored-by: Niels Bantilan <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Jarek-Rolski <[email protected]>
1 parent 2e355f7 commit f9b0af5

File tree

2 files changed

+105
-3
lines changed

2 files changed

+105
-3
lines changed

pandera/backends/pandas/error_formatters.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,10 @@ def format_vectorized_error_message(
5656
"pyspark.pandas"
5757
):
5858
failure_cases = reshaped_failure_cases.failure_case.to_numpy()
59+
failure_cases_string = ", ".join(failure_cases.astype(str))
5960
else:
6061
failure_cases = reshaped_failure_cases.failure_case
61-
62-
failure_cases_string = ", ".join(failure_cases.astype(str))
62+
failure_cases_string = ", ".join(failure_cases.apply(str))
6363

6464
return (
6565
f"{parent_schema.__class__.__name__} '{parent_schema.name}' failed "

tests/pandas/test_pandas_engine.py

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
import pytz
1414
from hypothesis import given
1515

16-
from pandera.pandas import Field, DataFrameModel, errors
16+
from pandera.pandas import Field, DataFrameModel, errors, check
1717
from pandera.engines import pandas_engine
1818
from pandera.errors import ParserError, SchemaError
1919

@@ -513,3 +513,105 @@ def test_pandas_arrow_dtype_error(data, dtype):
513513
):
514514
coerced_data = dtype.coerce(data)
515515
assert coerced_data.dtype == dtype.type
516+
517+
518+
def generate_test_cases_pandas_arrow_struct() -> (
519+
List[Tuple[pd.DataFrame, pd.DataFrame]]
520+
):
521+
"""
522+
Generate test parameter combinations for pandas arrow struct dtype.
523+
524+
Returns:
525+
List of tuples:
526+
- DataFrame with input struct data
527+
- DataFrame with expected output
528+
"""
529+
valid_data = pd.DataFrame(
530+
{
531+
"column": [
532+
[
533+
{"field1": 1.0, "field2": "a"},
534+
{"field1": 2.0, "field2": "b"},
535+
],
536+
[{"field1": 3.0, "field2": "c"}],
537+
]
538+
}
539+
)
540+
541+
invalid_data = pd.DataFrame(
542+
{
543+
"column": [
544+
[{"field1": 0.0, "field2": "Test"}],
545+
[{"field1": 2.0, "field2": "Test"}],
546+
]
547+
}
548+
)
549+
invalid_data_expected = pd.DataFrame(
550+
{
551+
"index": [0, 1],
552+
"failure_case": [
553+
[{"field1": 0.0, "field2": "Test"}],
554+
[{"field1": 2.0, "field2": "Test"}],
555+
],
556+
}
557+
)
558+
559+
mixed_data = pd.DataFrame(
560+
{
561+
"column": [
562+
[{"field1": 4.0, "field2": "d"}],
563+
[{"field1": None, "field2": "Test"}],
564+
]
565+
}
566+
)
567+
mixed_data_expected = pd.DataFrame(
568+
{"index": [1], "failure_case": [[{"field1": None, "field2": "Test"}]]}
569+
)
570+
571+
return [
572+
(valid_data, pd.DataFrame()),
573+
(invalid_data, invalid_data_expected),
574+
(mixed_data, mixed_data_expected),
575+
]
576+
577+
578+
@pytest.mark.parametrize(
579+
("data", "expected_output"), generate_test_cases_pandas_arrow_struct()
580+
)
581+
def test_pandas_arrow_struct_dtype(data, expected_output):
582+
"""Test pyarrow struct cases."""
583+
if not (
584+
pandas_engine.PYARROW_INSTALLED and pandas_engine.PANDAS_2_0_0_PLUS
585+
):
586+
pytest.skip("Support of pandas 2.0.0+ with pyarrow only")
587+
588+
class SimpleSchema(DataFrameModel):
589+
# pylint: disable=unexpected-keyword-arg,no-value-for-parameter
590+
column: pd.ArrowDtype = Field(
591+
coerce=True,
592+
dtype_kwargs={
593+
"pyarrow_dtype": pyarrow.list_(
594+
pyarrow.struct(
595+
{
596+
"field1": pyarrow.float32(),
597+
"field2": pyarrow.string(),
598+
}
599+
)
600+
)
601+
},
602+
)
603+
604+
@check("column", element_wise=True)
605+
@classmethod
606+
def check_column(cls, element):
607+
return all(e["field2"] != "Test" for e in element)
608+
609+
try:
610+
SimpleSchema.validate(data)
611+
except SchemaError as exc:
612+
for (_, failure_case), (_, expected_value) in zip(
613+
exc.failure_cases.iterrows(), expected_output.iterrows()
614+
):
615+
assert (
616+
failure_case["failure_case"] == expected_value["failure_case"]
617+
)

0 commit comments

Comments
 (0)