Closed
Description
Recreating my original post from here: http://stackoverflow.com/questions/26659941/what-are-the-default-na-values-when-pandas-loads-data/31705571#31705571
This documentation http://pandas.pydata.org/pandas-docs/stable/io.html#na-values states:
The default NaN recognized values are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan'].
However, this list is not complete.
If it was, these two pieces of code should produce the same result
The actual default values:
import pandas as pd
from StringIO import StringIO
sio = StringIO()
sio.write('"foo","bar"\n"1",""\n"NA","4"')
sio.seek(0)
pd.read_csv(sio, sep=",", quotechar='"')
foo bar
0 1 NaN
1 NaN 4
The default values copied and given:
sio = StringIO()
sio.write('"foo","bar"\n"1",""\n"NA","4"')
sio.seek(0)
pd.read_csv(sio, sep=",", quotechar='"',
keep_default_na=False,
na_values=['-1.#IND', '1.#QNAN', '1.#IND',
'-1.#QNAN', '#N/A','N/A', '#NA', 'NA'
'NULL', 'NaN', '-NaN', 'nan', '-nan'])
foo bar
0 1
1 NaN 4
Pandas version:
pd.__version__
'0.15.2'