Skip to content

Pandas Segfault when reading Parquet data #2224

Closed
@mrocklin

Description

@mrocklin

I'm playing with the criteo data and am getting an odd segfault from within Pandas even when on a single thread

>>> import dask.dataframe as dd
>>> import dask
>>> dask.set_options(get=dask.async.get_sync)
<dask.context.set_options object at 0x7ffff6474be0>
>>> df = dd.read_parquet('day-0.parquet')
>>> df.head()

Program received signal SIGBUS, Bus error.
0x00007fffec39f443 in __pyx_f_6pandas_5algos_take_1d_object_object_memview (
    __pyx_optional_args=<synthetic pointer>, __pyx_v_values=..., __pyx_v_indexer=..., __pyx_v_out=...)
   from /home/mrocklin/Software/anaconda/lib/python3.6/site-packages/pandas/algos.cpython-36m-x86_64-linux-gnu.so
(gdb) up
#1  __pyx_pf_6pandas_5algos_380take_1d_object_object (__pyx_self=<optimized out>, 
    __pyx_v_fill_value=0x7ffff7f61fa8, __pyx_v_out=..., __pyx_v_indexer=..., __pyx_v_values=<optimized out>)
    at pandas/algos.c:2818
2818	pandas/algos.c: No such file or directory.
(gdb) up
#2  __pyx_pw_6pandas_5algos_381take_1d_object_object (__pyx_self=<optimized out>, __pyx_args=<optimized out>, 
    __pyx_kwds=<optimized out>) at pandas/algos.c:2741
2741	in pandas/algos.c
(gdb) up
#3  0x00007ffff7994902 in _PyCFunction_FastCallDict (func_obj=0x7fffec2d53a8, args=0x1550960, 
    nargs=<optimized out>, kwargs=0x0) at Objects/methodobject.c:231
231	Objects/methodobject.c: No such file or directory.
(gdb) up
#4  0x00007ffff7a19f4c in call_function (pp_stack=0x7fffffffac58, oparg=<optimized out>, kwnames=0x0)
    at Python/ceval.c:4788
4788	Python/ceval.c: No such file or directory.

cc @jreback @martindurant

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions