-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
It seems the interchange Column.null_count() (relevant spec) has erroneous behaviour
>>> import pyarrow as pa
>>> pa.__version__
'12.0.0.dev304' # from https://pypi.fury.io/arrow-nightlies/
>>> df = pa.table([pa.array([float("nan")], type=pa.float64())], ["foo"])
>>> dfi = df.__dataframe__()
>>> col = dfi.get_column(0)
>>> col.null_count
0 # should be 1I assume this is because Arrow does not treat NaNs as nulls, which semantically makes sense, but in the interchange protocol it should—see vaexio/vaex#2120 for a related discussion.
See pandas for expected behaviour
>>> import pandas as pa
>>> df = pd.DataFrame({"foo": [float("nan")]})
>>> dfi = df.__dataframe__()
>>> col = dfi.get_column(0)
>>> col.null_count
1cc @AlenkaF (let me know if not to tag you on things! coincidentally I was working on data-apis/dataframe-interchange-tests#20 today when Ralf commented heh.)
Component(s)
Python