Skip to content

DataTable init does not not replace NaNs with pd.NA with float data type #128

@gsheni

Description

@gsheni
  • We want DataTable to use 1 representation of NaN (pd.NA). This is a forwarding looking feature of pandas.

The goal of pd.NA is provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending on the data type).

  • In our init of DataTable, we have a replace_none, which defaults to True. However, this is not working for some data types inputted into the DataTable
import numpy as np
import pandas as pd
import woodwork as ww
d = {'col1': [1, 2, np.nan], 'col2': [3, 4, None],
     'col3': pd.Series([1, 2, np.nan], dtype='Int64'),
     'col4': pd.Series([1, 2, None], dtype='string')}
df = pd.DataFrame(data=d)

df.dtypes

dt = ww.DataTable(df, name="retail", replace_none=True, copy_dataframe=True)
dt.dataframe

Screen Shot 2020-09-22 at 12 54 53 PM

- The expected behavior is that all NaN-like values in the DataFrame would be pd.NA

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions