Skip to content

Commit 3670cf8

Browse files
halickicosmicBboy
andauthored
Add Polars pydantic integration with format support and native JSON schema generation (#1979)
* Add Polars pydantic integration with format support and native JSON schema generation - Add pydantic validation for Polars DataFrames and LazyFrames - Implement DataFrame type conversion from various formats (dict, CSV, JSON, Parquet, Feather) - Replace pandas dependency with native Polars JSON schema generation - Support both Pydantic v1 and v2 with appropriate validators - Add comprehensive test suite for the integration Signed-off-by: Arkadiusz Halicki <[email protected]> * Blacken the code Signed-off-by: Arkadiusz Halicki <[email protected]> * fix pylint issues Signed-off-by: cosmicBboy <[email protected]> * Add comprehensive test coverage for typing/polars.py - Add new test file test_polars_typing.py with complete coverage for polars typing module - Test DataFrame.from_format and to_format with various data formats - Cover both success and error paths - Add tests for Pydantic integration - Add pragmas to conditionally exclude import-time and version-specific code from coverage Achieves 100% test coverage for the module. Signed-off-by: Arkadiusz Halicki <[email protected]> * Remove pragma no cover directives from typing/polars.py Tests now provide sufficient coverage without the need for pragma directives. Signed-off-by: Arkadiusz Halicki <[email protected]> * Add test coverage for typing/polars.py with pragmas - Created comprehensive test suite for typing/polars.py - Added tests for DataFrame, LazyFrame, and Series classes - Added tests for format conversion methods - Added tests for Pydantic integration (v1 and v2) - Added pragma no cover to hard-to-test code paths - Achieved 100% test coverage Signed-off-by: Arkadiusz Halicki <[email protected]> * Blacken the code again Signed-off-by: Arkadiusz Halicki <[email protected]> * Cleanup Signed-off-by: Arkadiusz Halicki <[email protected]> * test(polars): Add tests for optional columns in Polars DataFrameModel This commit adds test cases that demonstrate how to use optional columns with Polars DataFrameModels when integrating with Pydantic. The tests show: - Using Optional[Series[type]] annotation to make a column optional - Validating DataFrames with and without the optional column - Ensuring type validation still works on optional columns when present - Verifying that required columns still must be present These tests help document the supported patterns for optional columns in Pandera's Polars integration. Signed-off-by: Arkadiusz Halicki <[email protected]> * Blacken the code Signed-off-by: Arkadiusz Halicki <[email protected]> * Import specific pandera impl Signed-off-by: Arkadiusz Halicki <[email protected]> --------- Signed-off-by: Arkadiusz Halicki <[email protected]> Signed-off-by: cosmicBboy <[email protected]> Co-authored-by: cosmicBboy <[email protected]>
1 parent 88bb609 commit 3670cf8

File tree

4 files changed

+1633
-21
lines changed

4 files changed

+1633
-21
lines changed

pandera/api/polars/model.py

Lines changed: 60 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -169,35 +169,75 @@ def to_json_schema(cls):
169169
This function is currently does not fully specify a pandera schema,
170170
and is primarily used internally to render OpenAPI docs via the
171171
FastAPI integration.
172-
173-
:raises ImportError: if ``pandas`` is not installed.
174172
"""
175-
try:
176-
import pandas as pd
177-
except ImportError as exc:
178-
raise ImportError(
179-
"pandas is required to serialize polars schema to json-schema"
180-
) from exc
181-
182173
schema = cls.to_schema()
183-
empty = pl.DataFrame(
184-
schema={k: v.type for k, v in schema.dtypes.items()}
185-
).to_pandas()
186-
table_schema = pd.io.json.build_table_schema(empty)
187174

188-
def _field_json_schema(field):
189-
return {
175+
# Define a mapping from Polars data types to JSON schema types
176+
# This is more robust than string parsing
177+
POLARS_TO_JSON_TYPE_MAP = {
178+
# Integer types
179+
pl.Int8: "integer",
180+
pl.Int16: "integer",
181+
pl.Int32: "integer",
182+
pl.Int64: "integer",
183+
pl.UInt8: "integer",
184+
pl.UInt16: "integer",
185+
pl.UInt32: "integer",
186+
pl.UInt64: "integer",
187+
# Float types
188+
pl.Float32: "number",
189+
pl.Float64: "number",
190+
# Boolean type
191+
pl.Boolean: "boolean",
192+
# String types
193+
pl.Utf8: "string",
194+
pl.String: "string",
195+
# Date/Time types
196+
pl.Date: "datetime",
197+
pl.Datetime: "datetime",
198+
pl.Time: "datetime",
199+
pl.Duration: "datetime",
200+
}
201+
202+
def map_dtype_to_json_type(dtype):
203+
"""
204+
Map a Polars data type to a JSON schema type.
205+
206+
Args:
207+
dtype: Polars data type
208+
209+
Returns:
210+
str: JSON schema type string
211+
"""
212+
# First try the direct mapping
213+
if dtype.__class__ in POLARS_TO_JSON_TYPE_MAP:
214+
return POLARS_TO_JSON_TYPE_MAP[dtype.__class__]
215+
216+
# Fallback to string representation check for edge cases
217+
dtype_str = str(dtype).lower()
218+
if "float" in dtype_str:
219+
return "number"
220+
elif "int" in dtype_str:
221+
return "integer"
222+
elif "bool" in dtype_str:
223+
return "boolean"
224+
elif any(t in dtype_str for t in ["date", "time", "datetime"]):
225+
return "datetime"
226+
else:
227+
return "string"
228+
229+
properties = {}
230+
for col_name, col_schema in schema.dtypes.items():
231+
json_type = map_dtype_to_json_type(col_schema.type)
232+
properties[col_name] = {
190233
"type": "array",
191-
"items": {"type": field["type"]},
234+
"items": {"type": json_type},
192235
}
193236

194237
return {
195238
"title": schema.name or "pandera.DataFrameSchema",
196239
"type": "object",
197-
"properties": {
198-
field["name"]: _field_json_schema(field)
199-
for field in table_schema["fields"]
200-
},
240+
"properties": properties,
201241
}
202242

203243
@classmethod

0 commit comments

Comments
 (0)