Skip to content

BUG: IntervalIndex.astype("category") doesn't preserve exact interval dtype in categories #38316

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Somewhere in the conversion, before factorizing, we convert the interval array/index to a object-dtype numpy array of Interval objects, and so afterwards infer the IntervalDtype again when creating the categories.

Example consequence is that if you have uint64 intervals, they get inferred as int64 afterwards:

In [29]: index = pd.IntervalIndex.from_breaks(np.arange(5, dtype="uint64"))

In [30]: index
Out[30]: 
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]],
              closed='right',
              dtype='interval[uint64]')  # <---- unsigned ints

In [31]: pd.CategoricalIndex(index)
Out[31]: CategoricalIndex([(0, 1], (1, 2], (2, 3], (3, 4]], categories=[(0, 1], (1, 2], (2, 3], (3, 4]], ordered=False, dtype='category')

In [33]: pd.CategoricalIndex(index).dtype.categories
Out[33]: 
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4]],
              closed='right',
              dtype='interval[int64]')  # <---- no longer uint64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions