Pandas `take` blocking the GIL

I noticed that tpch query 1 is spending only about half it's time in parquet IO when using the dataset that's been produced by pyarrow

![image](https://github.com/coiled/benchmarks/assets/8629629/07c65f9e-b7fd-4622-9bd7-5139792be57c)


However, the run is heavily GIL congested and another GIL+native profile reveals that actually very few things (in python) are holding on to the GIL (native-only threads, e.g. of pyarrow are not tracked by py-spy so we don't see how arrow holds the gil to produce the dataframe)

![image](https://github.com/coiled/benchmarks/assets/8629629/7c33086c-369b-4561-9a89-82630b92f06c)

which points to the take pandas function. There's already been a recent fix to this code area (see https://github.com/pandas-dev/pandas/pull/54483) for axis0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pandas `take` blocking the GIL #1404

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pandas take blocking the GIL #1404

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pandas `take` blocking the GIL #1404