Skip to content

gh-131492: gh-131461: handle exceptions in GzipFile constructor while owning resources #131462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 58 additions & 48 deletions Lib/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,51 +202,58 @@ def __init__(self, filename=None, mode=None,
raise ValueError("Invalid mode: {!r}".format(mode))
if mode and 'b' not in mode:
mode += 'b'
if fileobj is None:
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
if filename is None:
filename = getattr(fileobj, 'name', '')
if not isinstance(filename, (str, bytes)):
filename = ''
else:
filename = os.fspath(filename)
origmode = mode
if mode is None:
mode = getattr(fileobj, 'mode', 'rb')


if mode.startswith('r'):
self.mode = READ
raw = _GzipReader(fileobj)
self._buffer = io.BufferedReader(raw)
self.name = filename

elif mode.startswith(('w', 'a', 'x')):
if origmode is None:
import warnings
warnings.warn(
"GzipFile was opened for writing, but this will "
"change in future Python releases. "
"Specify the mode argument for opening it for writing.",
FutureWarning, 2)
self.mode = WRITE
self._init_write(filename)
self.compress = zlib.compressobj(compresslevel,
zlib.DEFLATED,
-zlib.MAX_WBITS,
zlib.DEF_MEM_LEVEL,
0)
self._write_mtime = mtime
self._buffer_size = _WRITE_BUFFER_SIZE
self._buffer = io.BufferedWriter(_WriteBufferStream(self),
buffer_size=self._buffer_size)
else:
raise ValueError("Invalid mode: {!r}".format(mode))

self.fileobj = fileobj
try:
if fileobj is None:
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
Copy link
Contributor Author

@graingert graingert Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a more thorough change inspired by

cpython/Lib/ssl.py

Lines 1014 to 1015 in a4832f6

# Now SSLSocket is responsible for closing the file descriptor.
try:

previously it was possible for this self.myfileobj to be left open if any of the constructor failed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the try/except wrapping, why not add a flag which is set to False at the beginning of the constructor and True at the end, in __del__ (which is what emits the resource warning), check that flag. If it's not fully constructed then don't emit the resource warning?

The I/O stack (GzipFile inherits from IOBase) in IOBase.__del__ always does the close call if self.closed is False, so shouldn't need to manually add a special case here. GzipFile.closed is:

    @property
    def closed(self):
        return self.fileobj is None

(so if there is a fileobj set, close should always be called).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't help with myfileobj closing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using self.fileobj is not None as the "fully initialized" flag (already set to None always, only set to non-None at end of __init__), and checking around if self.filobj is None and self.myfileobj is not None in __del__?

It feels a lot like this is adding more cases / code around closing. Trying to find a way to get this change to be smaller / more minimal to reduce risks. IO stack construction, dealloc warnings, and close + destruction order is already really intricate and hard to understand in full.

Copy link
Contributor Author

@graingert graingert Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't rely on self.myfileobj being closed in __del__ and we don't return an instance of GzipFile for the caller to close so self.myfileobj needs to be closed at the end of __init__ if there's an exception: so we need the try/catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but this is closing objects that are inaccessible to the caller

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GzipFile / self is equally inaccessible, and cleaned up by its __del__ / finalizer tp_finalize (or tp_dealloc). It's guaranteed __del__ is called and able to release any held resources (such as self.myfileobj) see PEP-442.

I'm okay adding a cleanup for the case where self.fileobj is None and self.myfileobj is not None inside __del__. Matches the behavior of other io objects and is a lot smaller diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, you cannot rely on __del__ to be called when an exception is raised because the exception traceback holds a reference to self

Copy link
Contributor

@cmaloney cmaloney Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That statement seems to contradict PEP 442 Predictability section to me...

Following this scheme, an object’s finalizer is always called exactly once, even if it was resurrected afterwards.
For CI objects, the order in which finalizers are called (step 2 above) is undefined.

Even if there is an exception traceback that keeps the reference to self, eventually that traceback will go out of scope and file will be closed / not leaked...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @vstinner and @serhiy-storchaka for approval. From my perspective, this is a lot more complex of a bugfix than required, didn't match stated description and arguments around self.myfileobj. Happy to do quick fixes, can produce a much smaller patch here if desired with less behavior change.

if filename is None:
filename = getattr(fileobj, 'name', '')
if not isinstance(filename, (str, bytes)):
filename = ''
else:
filename = os.fspath(filename)
origmode = mode
if mode is None:
mode = getattr(fileobj, 'mode', 'rb')


if mode.startswith('r'):
self.mode = READ
raw = _GzipReader(fileobj)
self._buffer = io.BufferedReader(raw)
self.name = filename

elif mode.startswith(('w', 'a', 'x')):
if origmode is None:
import warnings
warnings.warn(
"GzipFile was opened for writing, but this will "
"change in future Python releases. "
"Specify the mode argument for opening it for writing.",
FutureWarning, 2)
self.mode = WRITE
self._init_write(filename)
self.compress = zlib.compressobj(compresslevel,
zlib.DEFLATED,
-zlib.MAX_WBITS,
zlib.DEF_MEM_LEVEL,
0)
self._write_mtime = mtime
self._buffer_size = _WRITE_BUFFER_SIZE
self._buffer = io.BufferedWriter(_WriteBufferStream(self),
buffer_size=self._buffer_size)
else:
raise ValueError("Invalid mode: {!r}".format(mode))

self.fileobj = fileobj

if self.mode == WRITE:
self._write_gzip_header(compresslevel)
if self.mode == WRITE:
self._write_gzip_header(compresslevel)
except:
# Avoid a ResourceWarning if the write fails,
# eg read-only file or KeyboardInterrupt
self._close()
raise

@property
def mtime(self):
Expand Down Expand Up @@ -387,11 +394,14 @@ def close(self):
elif self.mode == READ:
self._buffer.close()
finally:
self.fileobj = None
myfileobj = self.myfileobj
if myfileobj:
self.myfileobj = None
myfileobj.close()
self._close()

def _close(self):
self.fileobj = None
myfileobj = self.myfileobj
if myfileobj is not None:
self.myfileobj = None
myfileobj.close()

def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH):
self._check_not_closed()
Expand Down
12 changes: 8 additions & 4 deletions Lib/test/test_tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from test import support
from test.support import os_helper
from test.support import script_helper
from test.support import warnings_helper

# Check for our compression modules.
try:
Expand Down Expand Up @@ -1638,10 +1639,13 @@ def write(self, data):
raise exctype

f = BadFile()
with self.assertRaises(exctype):
tar = tarfile.open(tmpname, self.mode, fileobj=f,
format=tarfile.PAX_FORMAT,
pax_headers={'non': 'empty'})
with (
warnings_helper.check_no_resource_warning(self),
self.assertRaises(exctype),
):
tarfile.open(tmpname, self.mode, fileobj=f,
format=tarfile.PAX_FORMAT,
pax_headers={'non': 'empty'})
self.assertFalse(f.closed)

def test_missing_fileobj(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix :exc:`ResourceWarning` when constructing a :class:`gzip.GzipFile` in write mode with a broken file object.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a resource leak when constructing a :class:`gzip.GzipFile` with a filename fails, for example when passing an invalid ``compresslevel``.
Loading