Skip to content

Pandas take blocking the GIL #1404

Open
@fjetter

Description

@fjetter

I noticed that tpch query 1 is spending only about half it's time in parquet IO when using the dataset that's been produced by pyarrow

image

However, the run is heavily GIL congested and another GIL+native profile reveals that actually very few things (in python) are holding on to the GIL (native-only threads, e.g. of pyarrow are not tracked by py-spy so we don't see how arrow holds the gil to produce the dataframe)

image

which points to the take pandas function. There's already been a recent fix to this code area (see pandas-dev/pandas#54483) for axis0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions