Skip to content

[Python] Interchange pa.Table's Column.null_count doesn't count NaNs #34774

@honno

Description

@honno

Describe the bug, including details regarding any error messages, version, and platform.

It seems the interchange Column.null_count() (relevant spec) has erroneous behaviour

>>> import pyarrow as pa
>>> pa.__version__
'12.0.0.dev304'  # from https://pypi.fury.io/arrow-nightlies/
>>> df = pa.table([pa.array([float("nan")], type=pa.float64())], ["foo"])
>>> dfi = df.__dataframe__()
>>> col = dfi.get_column(0)
>>> col.null_count
0  # should be 1

I assume this is because Arrow does not treat NaNs as nulls, which semantically makes sense, but in the interchange protocol it should—see vaexio/vaex#2120 for a related discussion.

See pandas for expected behaviour

>>> import pandas as pa
>>> df = pd.DataFrame({"foo": [float("nan")]})
>>> dfi = df.__dataframe__()
>>> col = dfi.get_column(0)
>>> col.null_count
1

cc @AlenkaF (let me know if not to tag you on things! coincidentally I was working on data-apis/dataframe-interchange-tests#20 today when Ralf commented heh.)

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions