-
-
Notifications
You must be signed in to change notification settings - Fork 84
Description
I'm trying to use uncertainties with Pandas, Pint and Pint-Pandas. Pint-Pandas makes it easy to have quantified values on a column basis that really don't interact much (or at least badly) with other columns.
uncertainties relies of wrappers to do its things, whereas Pint and Pint-Pandas are now very complete in using ExtensionArrays to interact with Pandas. ExtensionArrays define a value for their na_type, which for most things numeric means np.nan.
In my past dealings with uncertainties, the nan for that has been nan+/-0, which has been fine, except that it now makes for difficult promotion rules. If I have an extension array of quantities (tons of CO2, millions of USD, whatever) with normal float64 magnitudes, the correct na_value for that is np.nan. But if I fill the array with uncertainties as magnitudes, the logical na_value would be nan+/-0. But there's no concept of multiple na_value depending on whether there are uncertainties in the mix.
One solution is to just bite the bullet and say "if you use uncertainties anywhere, then every dataframe needs to honor them, meaning that the na_value for ANYTHING is nan+/-0 (and all magnitudes must promote to UFloat)." What I'd like to do is to manage that column-by-column.
Is there a world in which np.nan is a fully adequate value for uncertainties, with whatever promotions/substitutions, etc happening within the wrappers? Or do I need to majorly rethink my approach of layering these various abstractions (uncertainties, quantities, DataFrames) together?