Support for multiple argument dispatch functions based on schema validation #1545
billyvinning
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Firstly, I think that
panderais a great tool for writing production-readypandasprojects. Where I think there is room for extension inpanderais in its data pipelining integration. One nice feature to have when defining data pipelines, I think, would be multiple dispatch functions.There are a few ways of achieving multiple dispatch functions with traditional Python types, these being
functools.singledispatch,multipledispatchandmultimethod. I am proposing an analogous support inpandera, where the dispatch function is matched based on validation of the data passed. The ultimate use case for this is for the user to be able to write generic data pipelines that accept multiple schemas and validate the inputs and outputs of each step. As a minimal example, put explicitly, given that we have some schema defined:A,BandC, we want to be able to define a single interface that a) transforms bothA->CandB->Cand b) validates the inputs and outputs.Has this idea ever been considered for
pandera? If not, is there any interest for this kind of thing?I was able to achieve a dirty minimal prototype by extending
multimethod.multimethod, please see below for the code. The basic premise of this snippet is that my implementation ofmultischema.registerconverts occurrences ofpa.typing.DataFrame[...]in the type hints of registered functions intomultimethod.parametric. My usage ofmultimethod.parametricdefines a custom type tied to each schema that definesisinstancebased on whetherDataFrameModel.validate(df)throws an exception or not. I recognise that we would have to consider what should happen when some given data can be validated against multiple schema that are registered by the dispatch function.Environment:
Example:
Expected output:
Let me know what you think!
Beta Was this translation helpful? Give feedback.
All reactions