fix: restrict completion logging to rank 0 by fatih-uzlmz · Pull Request #2383 · pytorch/torchtitan

fatih-uzlmz · 2026-02-15T04:06:45Z

Summary

This PR fixes a logging inconsistency where "Training completed" was printed by all ranks, causing redundant console spam at the end of distributed training runs.

Changes

Moved logger.info("Training completed") inside the existing if torch.distributed.get_rank() == 0: block in train.py (Line 711).

Impact

Ensures a clean exit message only from the master process, matching the behavior of the preceding "Sleeping..." log.

tianyu-l

left a comment

tianyu-l · 2026-02-15T04:14:19Z

            time.sleep(2)
-
-        logger.info("Training completed")
+            logger.info("Training completed")


There are other loggings happening on every rank, if you search logger.info in train.py. Curious why do you care about this one most?

btw if you are using torchrun, this command helps filter printing on certain ranks https://github.com/pytorch/torchtitan/blob/main/run_train.sh#L29

Thanks for the review :)

You make a great point about torchrun handling the filtering, but my main motivation here was code consistency and intent, similar to recent cleanup work I've been doing in torchtune.

Basically in lines 707-709, we explicitly check if rank == 0 to handle the final sleep/coordination. It appeared that the logger.info("Training completed") immediately following it was intended to be part of that same "clean exit" block but was essentially "orphaned" outside the indentation.

I recently addressed similar logging inconsistencies in torchtune (PR #2950) to ensure rank-zero logging is enforced at the source level rather than relying solely on the runner. I thought applying that same standard here would keep the two codebases aligned.

tianyu-l · 2026-02-15T22:48:38Z

@@ -707,8 +707,7 @@ def train(self):
        if torch.distributed.get_rank() == 0:


@fegin should we only sleep on rank 0?

## Summary This PR fixes a logging inconsistency where "Training completed" was printed by all ranks, causing redundant console spam at the end of distributed training runs. ## Changes - Moved `logger.info("Training completed")` inside the existing `if torch.distributed.get_rank() == 0:` block in `train.py` (Line 711). ## Impact - Ensures a clean exit message only from the master process, matching the behavior of the preceding "Sleeping..." log.

fix: restrict completion logging to rank 0

2e00f03

fatih-uzlmz requested review from fegin, tianyu-l, wconstab and wwwjn as code owners February 15, 2026 04:06

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 15, 2026

tianyu-l reviewed Feb 15, 2026

View reviewed changes

tianyu-l approved these changes Feb 15, 2026

View reviewed changes

tianyu-l merged commit 10d8a30 into pytorch:main Feb 16, 2026
11 checks passed

fatih-uzlmz deleted the fix/training-completion-logging branch February 16, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restrict completion logging to rank 0#2383

fix: restrict completion logging to rank 0#2383
tianyu-l merged 1 commit into
pytorch:mainfrom
fatih-uzlmz:fix/training-completion-logging

fatih-uzlmz commented Feb 15, 2026

Uh oh!

tianyu-l left a comment •

edited

Loading

Uh oh!

tianyu-l Feb 15, 2026

Uh oh!

fatih-uzlmz Feb 15, 2026

Uh oh!

tianyu-l Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -707,8 +707,7 @@ def train(self):
		if torch.distributed.get_rank() == 0:

Conversation

fatih-uzlmz commented Feb 15, 2026

Summary

Changes

Impact

Uh oh!

tianyu-l left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tianyu-l Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

fatih-uzlmz Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tianyu-l left a comment •

edited

Loading