Skip to content

Warn when running an infinite epoch and overriding "epoch end" accumulating hooks #11554

@carmocca

Description

@carmocca

🚀 Feature

When the user configures Trainer(max_steps=-1, max_epochs=-1) an endless epoch runs, so overriding training_epoch_end or validation_epoch_end with val_check_interval==float can be a problem because they will keep outputs in memory indefinitely.

Motivation

Many users are not aware of the impact of overriding these hooks so infinite epochs open the door to "memory leaks".

Pitch

Raise a warning in this case informing the user of this behaviour.

Additional context

Proposed in #11480 (comment)


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @carmocca @awaelchli @ninginthecloud @daniellepintz @rohitgr7 @justusschock @kaushikb11

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions