-
-
Notifications
You must be signed in to change notification settings - Fork 366
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Is your feature request related to a problem? Please describe.
Currently, strategies are limited by the hypothesis.extras.pandas convention of how to define a dataframe. Namely, the strategy used to generate data values are at the element-level. This makes it hard to create strategies for a whole column or those that model the dependencies between columns.
For previous context on the problem with strategies, see #1605, #1220, #1275.
Describe the solution you'd like
We need a re-write! 🔥
As described in #1605, the requirements for a pandera pandas strategy rewrite are:
- Strategies that work for all pandera schemas (this is a really high bar, but I think possible), with reasonable escape hatches when pandera cannot automatically figure out how to generate a df.
- Generating entire columns instead of individual elements
- Incorporating cross-column dependencies
- A user-friendly way of overriding strategies (from pre-existing Checks) or custom strategies
- Columns with multiple checks should not chain strategies with filter, it should maybe override data with the new constraint.
- ... (others?)
More context on the current state
At a high level, this is how pandera currently translates a schema to a hypothesis strategy:
- For each column, index, obtain the following metadata:
- Column name, datatype, and checks
- If the column name is a regex expression, generate column names based on the regex
- Define a hypothesis
column. This contains the datatypes, elements, and other properties of the column. - Based on the
pa.Columndtype, properties (e.g. unique), and first check in the list ofcheck, forward them to the hypothesis column. This creates an element strategy for a single value in that column. - For any subsequent
Checkin the list, get their check stats (constraint values) and chain them to the element strategy withfilter(this really sucks, i.e. slows down performance.)
honno, tekumara, pchoinsk, Anatoly-Makarevich and NowanIlfideme
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed