Skip to content

bpo-44186: prevent TimedRotatingFileHandler overwriting log files #26553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Doc/library/logging.handlers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,8 @@ timed intervals.
The system will save old log files by appending extensions to the filename.
The extensions are date-and-time based, using the strftime format
``%Y-%m-%d_%H-%M-%S`` or a leading portion thereof, depending on the
rollover interval.
rollover interval. In the case that an existing file has the same name,
a numbered suffix will be added to the new file to prevent overwriting logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at other issues recently - bpo-43344, bpo-44753 - so this solution might not work in those scenarios. Specifcally, in scenarios where the user wants to keep the suffix as ".log" or similar. After changes to address those issues, rollover logic expects rolled-over files to start with the base filename (minus extension) + ".", and contain self.extMatch. I've also added comments on the issue itself.


When computing the next rollover time for the first time (when the handler
is created), the last modification time of an existing log file, or else
Expand Down
9 changes: 7 additions & 2 deletions Lib/logging/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ def __init__(self, filename, when='h', interval=1, backupCount=0,
raise ValueError("Invalid rollover interval specified: %s" % self.when)

self.extMatch = re.compile(self.extMatch, re.ASCII)
self.file_number = re.compile(r'\(([0-9]+)\)$')
self.interval = self.interval * interval # multiply by units requested
# The following line added because the filename passed in could be a
# path object (see Issue #27493), but self.baseFilename will be a string
Expand Down Expand Up @@ -401,8 +402,12 @@ def doRollover(self):
timeTuple = time.localtime(t + addend)
dfn = self.rotation_filename(self.baseFilename + "." +
time.strftime(self.suffix, timeTuple))
if os.path.exists(dfn):
os.remove(dfn)
while os.path.exists(dfn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more future-proof if it were structured as:

if os.path.exists(dfn):
    self.backup_existing(dfn)

and the new backup_existing method would have the behaviour similar to this docstring:

def backup_existing(self, dfn):
    """
    Back up an existing file with the same name `dfn` as a destination file for rollover.

    This method must find a unique filename in the same directory, related to `dfn`,
    and rename the existing file to that name.

    This implementation appends ' (N)' to `dfn`, where N is an integer going up from 1
    until an integer value N is found such that no existing file with that ' (N)' suffix is found.
    """

Having a separate method allows someone with a custom scheme for naming logfiles (using the namer attribute) to adapt their backup to that scheme by overriding the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being slow, I just wanted to clear make sure I have the correct understanding. I agree that the having a separate backupExisting method is a good idea (probably backupExisting instead of backup_existing for consistency with the rest of the module?)

I wasn't 100% sure on the behaviour of the method though, from my understanding on the rollover, the new file should be created with a suffix, from the docstring above, it seems like you want to rename the existing file to have a suffix?

as an example, I have the following log files

test.log
========
[INFO]: foo

test.log.2021-08-02
========
[INFO]: bar

for all the examples below, I'll denote that structure as

test.log # foo
test.log.2021-08-02 # bar

I was imagining the behaviour below:

test.log # baz
test.log.2021-08-02 # bar
test.log.2021-08-02 (1) # foo

This would mean that for n files, the latest file would always have the highest suffix. The current log would obviously have no suffix, the oldest rollover log would only have the date, and the following files would have (1), ... (N) in the order that they were generated.

From reading your comment, implemeting that method would yield the behaviour below:

test.log # baz
test.log.2021-08-02 # foo
test.log.2021-08-02 (1) # bar

where the most recent file would rename the already existing file in the directory and "bump" it up a number.

Personally I prefer the first method as implementing the method this way could cause some ambiguity i.e. in the case of a directory with suffixes (1), (2), (3), what happens to the newest file? The newest file gets the suffix (1), and then does the already existing (1) get bumped to (4), or does it get bumped to (2), and then (2) -> (3) and so on.

I'm not sure how this different logic affects the other issues you linked, I had a look at them but I need to do some testing to see the results myself as I couldn't 100% follow just the PR and bpo. I'll try and implement the logic and update this PR tonight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't rush to update the PR, because maybe we should think about things some more. I'm not able to say much at the minute as I'm pressed for time, but I was thinking along the lines of: a target file that exists when rolling over should be an unusual case, not a common case (This issue hasn't come up in the many years since TimedRotatingFileHandler was added, IIRC). So, my proposed approach treats the existing file as just a backup (so stuff doesn't get lost), but once it's renamed, it's cleared the way for the expected dfn to be used, and the "usual" scheme applies from then on for log files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(probably backupExisting instead of backup_existing for consistency with the rest of the module?

Yes, good point. The original logging module was written back before PEP8, and kept the old-style naming for backward compatibility. So backupExisting is right.

if match := re.search(self.file_number, dfn):
name = re.sub(self.file_number, f'({int(match.groups()[0]) + 1})', dfn)
dfn = self.rotation_filename(name)
else:
dfn = self.rotation_filename(self.baseFilename + '.' + time.strftime(self.suffix, timeTuple) + ' (2)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the + ' (2)' ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was attempting to mimic the default behaviour I saw from Windows 10, in Windows if you create a new document with a duplicate name it will create the documents like this:

New Document
New Document (2)
New Document (3)

On my Mac it omits the parenthesis and creating new documents yields a naming scheme like

New Document
New Document 2
New Document 3

I personally preferred the bracketed naming scheme but that is really just my personal preference. I don't know of anywhere else this pops up in the standard library but if there's a convention that I'm not following I'd be happy to rework that.

self.rotate(self.baseFilename, dfn)
if self.backupCount > 0:
for s in self.getFilesToDelete():
Expand Down
47 changes: 47 additions & 0 deletions Lib/test/test_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -5331,6 +5331,53 @@ def test_rollover(self):
print(tf.read())
self.assertTrue(found, msg=msg)

def test_overwrite_rollover(self):
# bpo-44186 test that pre-existing files aren't overwritten by rollover
now = datetime.datetime.now()
handler = logging.handlers.TimedRotatingFileHandler(
self.fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1)

# compute the filename
t = handler.rolloverAt - handler.interval
timeTuple = time.gmtime(t)
fn = handler.rotation_filename(
handler.baseFilename + '.' + time.strftime(handler.suffix, timeTuple))

# touch a file with the intended filename
pathlib.Path(fn).touch()
self.rmfiles.append(fn)

r = logging.makeLogRecord({'msg': 'test msg'})

handler.emit(r)
handler.close()

self.assertLogFile(fn + ' (2)')

def test_lowest_filename_generated(self):
# test that when the filename is generated, the lowest possible suffix is used
# e.g. if the files "log.2021-06-19", "log.2021-06-19 (2)", and "log.2021-06-19 (4)" exist
# the filename generated should be "log.2021-06-19 (3)"
now = datetime.datetime.now()
fh = logging.handlers.TimedRotatingFileHandler(
self.fn, 'midnight', atTime=now, encoding='utf-8', backupCount=1)

# compute the filename
t = fh.rolloverAt - fh.interval
timeTuple = time.gmtime(t)
fn = fh.rotation_filename(
fh.baseFilename + '.' + time.strftime(fh.suffix, timeTuple))

for filename in [fn, fn + ' (2)', fn + ' (4)']:
pathlib.Path(filename).touch()
self.rmfiles.append(filename)

r = logging.makeLogRecord({'msg': 'test msg'})
fh.emit(r)
fh.close()

self.assertLogFile(fn + ' (3)')

def test_invalid(self):
assertRaises = self.assertRaises
assertRaises(ValueError, logging.handlers.TimedRotatingFileHandler,
Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1017,6 +1017,7 @@ John J. Lee
Thomas Lee
Robert Leenders
Cooper Ry Lees
Harry Lees
Yaron de Leeuw
Tennessee Leeuwenburg
Luc Lefebvre
Expand Down