BUG: to_parquet (pyarrow) opens local path twice#65811
Open
mkviatkovskii wants to merge 3 commits into
Open
Conversation
aae43b2 to
6119f8a
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6119f8a to
02eeb67
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/v3.1.0.rstfile if fixing a bug or adding a new feature.AGENTS.md.For a local filesystem path, the pyarrow engine resolved the path through
get_handlein_get_path_or_handleand then handed the string to pyarrow, which opened the same path a second time through its own C++ I/O. The redundant open wasted a syscall on POSIX and, on filesystems that finalize a file's contents on close, let the empty pandas-side descriptor close last and truncate pyarrow's output to 0 bytes (silent data loss).This skips
get_handlefor local paths and passes the string straight to pyarrow, reproducing the~expansion and parent-directory check thatget_handlepreviously performed. Non-fsspec URLs (e.g. http/https) still route throughget_handle, since pyarrow cannot fetch those.