Skip to content

BUG: Collection of inconsistencies in .astype conversions #37626

Open
@mlondschien

Description

@mlondschien

I have a use case where (automatic) casting between the following pandas dtypes is necessary;

bool, boolean, int64, Int64, float64, object and string.

Note that boolean, Int64 and string are the new pandas 1.0 nullable dtypes.

The default approach for this would be series.astype(target_dtype), for target_dtype one of the above dtypes as strings. This works (given no issues with missings) but for the inconsistencies below:

In [1]: import pandas as pd
   ...: import numpy as np

pd.NA to float:

Summary: Casting float(pd.NA) raises a TypeError (as does casting float(None)). While np.array([None, "1"]).astype("float") works (and gets called here), the same call with pd.NA fails.

In [2]: pd.Series(["1", "2", pd.NA], dtype="object").astype("float")
TypeError: float() argument must be a string or a number, not 'NAType'

In [3]: pd.Series(["1", "2", pd.NA], dtype="string").astype("float")
TypeError: float() argument must be a string or a number, not 'NAType'

In [4]: pd.Series(["1", "2", pd.NA], dtype="string").astype("category").astype("float")  # magic workaround
Out[4]: 
0    1.0
1    2.0
2    NaN
dtype: float64

Edit: Fixed with #37974.

object / string to Int64 (nullable)

Summary: object columns cannot be casted to Int64. Casting

  • object -> string -> Int64 works.
  • object -> float -> Int64 works if the data does not contain pd.NA (see above)
  • object -> int64 -> Int64 works if the data does not contain missings.
In [5]: pd.Series(["1", "2", "3"]).astype("Int64")
TypeError: object cannot be converted to an IntegerDtype

In [5]: pd.Series(["1", "2", "3"], dtype="string").astype("Int64")
Out[5]: 
0    1
1    2
2    3
dtype: Int64

In [6]:  pd.Series(["1", "2", "3"]).astype("int").astype("Int64")
Out[6]:
0    1
1    2
2    3
dtype: Int64

In [7]:  pd.Series(["1", "2", None]).astype("int").astype("Int64")  # int columns cannot contain missings
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

In [8]:  pd.Series(["1", "2", None]).astype("float").astype("Int64")  # magic workaround
Out[8]:
0       1
1       2
2    <NA>
dtype: Int64

In [9]: pd.Series(["1", "2", pd.NA], dtype="object").astype("float").astype("Int64")  # see pd.NA -> float
TypeError: float() argument must be a string or a number, not 'NAType'

In [10]: pd.Series(["1", "2", pd.NA], dtype="object").astype("string").astype("Int64")
Out[10]: 
0       1
1       2
2    <NA>
dtype: Int64

Related: #25472 (comment)

string / object to bool or boolean

Summary: Casting string or object columns to bool or boolean behaves strangely. I am not sure what the expected behaviour for string / object to bool / boolean should be. It would be nice to have consistent behaviour.

  • string / object -> bool works if there are no missings, but yields only True
  • string / object -> boolean raises
In [11]: pd.Series(["True", "False", "bogus"], dtype="string").astype("bool")  # everything is True
Out[11]:
0    True
1    True
2    True
dtype: bool

In [12]: pd.Series(["True", "False", "bogus"], dtype="object").astype("bool")
Out[12]:
0    True
1    True
2    True
dtype: bool

In [13]: pd.Series(["True", "False", "bogus"], dtype="string").astype("boolean")
TypeError: data type not understood

In [14]: pd.Series(["True", "False", "bogus"], dtype="object").astype("boolean")
TypeError: Need to pass bool-like values

int (non-nullable) to boolean

Summary: Casting from (non-nullable) int64 to (nullable) boolean raises.

  • int64 -> Int64 -> boolean works
  • int64 -> bool -> boolean works as long as there are no missings.
In [15] pd.Series([-1, 0, 1], dtype="int").astype("boolean")
TypeError: Need to pass bool-like values

In [16]: pd.Series([-1, 0, 1], dtype="Int64").astype("boolean")
Out[16]:
0     True
1    False
2     True
dtype: boolean

In [17]: pd.Series([-1, 0, 1], dtype="int").astype("bool").astype("boolean")
Out[17]:
0     True
1    False
2     True
dtype: boolean

Related: #37614

While there exist separate issues for the first and last report, I gathered that it might be nice to have a collection of these somewhere, which I did not find.

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 15f843ab102d7a0cd7f1c7870dfec72d0e28d252
python           : 3.8.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-52-generic
Version          : #57~18.04.1-Ubuntu SMP Thu Oct 15 14:04:49 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.2.0.dev0+1059.g15f843ab1.dirty
numpy            : 1.18.5
pytz             : 2020.4
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 49.6.0.post20201009
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.41.1
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.3.7
lxml.etree       : 4.6.1
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : 1.3.2
fsspec           : 0.8.4
fastparquet      : 0.4.1
gcsfs            : 0.7.1
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : 2.0.0
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.3
sqlalchemy       : 1.3.20
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.1
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions