Skip to content

BUG: to_parquet (pyarrow) opens local path twice#65811

Open
mkviatkovskii wants to merge 3 commits into
pandas-dev:mainfrom
mkviatkovskii:fix/parquet-local-path-double-open
Open

BUG: to_parquet (pyarrow) opens local path twice#65811
mkviatkovskii wants to merge 3 commits into
pandas-dev:mainfrom
mkviatkovskii:fix/parquet-local-path-double-open

Conversation

@mkviatkovskii

Copy link
Copy Markdown

For a local filesystem path, the pyarrow engine resolved the path through get_handle in _get_path_or_handle and then handed the string to pyarrow, which opened the same path a second time through its own C++ I/O. The redundant open wasted a syscall on POSIX and, on filesystems that finalize a file's contents on close, let the empty pandas-side descriptor close last and truncate pyarrow's output to 0 bytes (silent data loss).

This skips get_handle for local paths and passes the string straight to pyarrow, reproducing the ~ expansion and parent-directory check that get_handle previously performed. Non-fsspec URLs (e.g. http/https) still route through get_handle, since pyarrow cannot fetch those.

@mkviatkovskii mkviatkovskii force-pushed the fix/parquet-local-path-double-open branch 2 times, most recently from aae43b2 to 6119f8a Compare June 4, 2026 18:03
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mkviatkovskii mkviatkovskii force-pushed the fix/parquet-local-path-double-open branch from 6119f8a to 02eeb67 Compare June 4, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: to_parquet (pyarrow engine) opens the local file path twice

1 participant