Pandas version checks
Reproducible Example
import builtins
import os
import pandas as pd
# Spy on Python-level opens of the destination path.
target = "test.parquet"
opens = []
real_open = builtins.open
def spy_open(file, *args, **kwargs):
if os.path.abspath(os.fspath(file)) == os.path.abspath(target):
opens.append(args)
return real_open(file, *args, **kwargs)
builtins.open = spy_open
try:
pd.DataFrame({"a": [1, 2, 3]}).to_parquet(target, engine="pyarrow")
finally:
builtins.open = real_open
print(f"pandas opened the destination {len(opens)} time(s) at the Python level")
# Observed: 1 -> pandas opens the file itself via get_handle...
# ...but pyarrow ALSO opens the same path through its C++ layer and does the
# actual writing, so the path is opened twice per to_parquet call.
# On Linux this is also visible at the syscall level:
# strace -f -e trace=openat python -c \
# "import pandas as pd; pd.DataFrame({'a':[1]}).to_parquet('x.parquet')" 2>&1 \
# | grep -c 'x.parquet'
# two openat() calls for the same path
Issue Description
For a local filesystem path, to_parquet with the pyarrow engine resolves the path through pandas.io.common.get_handle inside _get_path_or_handle, then unwraps the handle's .name back to a string and hands that string to pyarrow. pyarrow then opens the same path a second time through its own (memory-mapped, multithreaded) C++ I/O layer, which is where the read/write actually happens. So every local-path call opens the destination/source twice.
Consequences:
- On POSIX: a wasted
open()/close() syscall pair on every call (minor, but pointless).
- On filesystems that finalize a file's contents when the descriptor is closed (e.g. certain write-once / object-store backed filesystems): the empty pandas-side descriptor is closed after pyarrow has written and closed its own, so the empty one wins and the file is silently truncated to 0 bytes - data loss with no error raised.
Expected Behavior
A local path should be opened exactly once. pandas should hand the string path directly to pyarrow (which opens it itself), without first opening it via get_handle. The repro above should print 0 Python-level opens of the destination, and strace should show a single openat() for the path.
(Non-fsspec URLs such as http(s):// still need get_handle, since pyarrow can't fetch those - only genuine local paths should skip it.)
Installed Versions
Details
INSTALLED VERSIONS
------------------
commit : 72f2fea
python : 3.11.2
python-bits : 64
OS : Linux
OS-release : 6.1.0-48-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.1.172-1 (2026-05-15)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 3.0.3
numpy : 2.4.6
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyiceberg : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
For a local filesystem path,
to_parquetwith the pyarrow engine resolves the path throughpandas.io.common.get_handleinside_get_path_or_handle, then unwraps the handle's .name back to a string and hands that string to pyarrow. pyarrow then opens the same path a second time through its own (memory-mapped, multithreaded) C++ I/O layer, which is where the read/write actually happens. So every local-path call opens the destination/source twice.Consequences:
open()/close()syscall pair on every call (minor, but pointless).Expected Behavior
A local path should be opened exactly once. pandas should hand the string path directly to pyarrow (which opens it itself), without first opening it via
get_handle. The repro above should print 0 Python-level opens of the destination, and strace should show a singleopenat()for the path.(Non-fsspec URLs such as http(s):// still need
get_handle, since pyarrow can't fetch those - only genuine local paths should skip it.)Installed Versions
Details
INSTALLED VERSIONS ------------------ commit : 72f2fea python : 3.11.2 python-bits : 64 OS : Linux OS-release : 6.1.0-48-amd64 Version : #1 SMP PREEMPT_DYNAMIC Debian 6.1.172-1 (2026-05-15) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8pandas : 3.0.3
numpy : 2.4.6
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyiceberg : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None