Skip to content

Regression in CategoricalIndex in 0.20rc1 #16115

@bashtage

Description

@bashtage

Code Sample, a copy-pastable example if possible

cats = pd.Categorical([pd.Timestamp('12-31-1999'),pd.Timestamp('12-31-2000')])
dummies = pd.get_dummies(cats)
dummies[[c for c in dummies.columns]]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-4bc701ecdf50> in <module>()
----> 1 dummies[dummies.columns]

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2052         if isinstance(key, (Series, np.ndarray, Index, list)):
   2053             # either boolean or fancy integer index
-> 2054             return self._getitem_array(key)
   2055         elif isinstance(key, DataFrame):
   2056             return self._getitem_frame(key)

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
   2096             return self.take(indexer, axis=0, convert=False)
   2097         else:
-> 2098             indexer = self.loc._convert_to_indexer(key, axis=1)
   2099             return self.take(indexer, axis=1, convert=True)
   2100

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1211                 # if it cannot handle
   1212                 indexer, objarr = labels._convert_listlike_indexer(
-> 1213                     obj, kind=self.name)
   1214                 if indexer is not None:
   1215                     return indexer

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexes\base.py in _convert_listlike_indexer(self, keyarr, kind)
   1384             keyarr = self._convert_arr_indexer(keyarr)
   1385
-> 1386         indexer = self._convert_list_indexer(keyarr, kind=kind)
   1387         return indexer, keyarr
   1388

C:\anaconda\envs\py35-pandas-20\lib\site-packages\pandas\core\indexes\category.py in _convert_list_indexer(self, keyarr, kind)
    508         if (indexer == -1).any():
    509             raise KeyError(
--> 510                 "a list-indexer must only "
    511                 "include values that are "
    512                 "in the categories")

KeyError: 'a list-indexer must only include values that are in the categories'

Problem description

There is no obvious change in get_dummies. The problem must be deeper in the indexing of a CateogricalIndex

Expected Output

The original dummy frame -- this is a trivial selection.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.3
IPython: 6.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions