Skip to content

Trying to access non-existent df column throws KeyError for wrong key #28799

Closed
@jwhendy

Description

@jwhendy

Code Sample, a copy-pastable example if possible

I ran into this on a more complicated groupby().apply(lambda: ...) call, but reproduced a simple version below of what I think seems like a bug (or at least undesirable behavior).

import pandas as pd

# simple dataframe
test = pd.DataFrame({'var': ['a', 'a', 'b', 'b'], 'val': range(4)})

# simulated mistake of asking for a 'vau' column, not 'val'
test.groupby('var').apply(lambda rows: pd.DataFrame({'var': [rows.iloc[-1]['var']],
                                                     'val': [rows.iloc[-1]['vau']]}))

Doing this yields:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
   4735             try:
-> 4736                 return libindex.get_value_box(s, key)
   4737             except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_box()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

----------  8<  [snip]  >8 ----------

KeyError: 'var'

Problem description

When attempting to troubleshoot this, KeyError: 'foo' is misleading, as it doesn't point to the key that failed, but one that succeeded. In my actual code, I was doing this for more variables, so while this case is trivial to troubleshoot, real world cases might be more confusing and challenging to figure out for the user.

Expected Output

A KeyError: 'foo' where foo is the incorrect key, not a successful one.

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.1-arch1-1-ARCH
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.1
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 2.6.3
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions