-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
🚀 Feature
When the user configures Trainer(max_steps=-1, max_epochs=-1)
an endless epoch runs, so overriding training_epoch_end
or validation_epoch_end
with val_check_interval==float
can be a problem because they will keep outputs in memory indefinitely.
Motivation
Many users are not aware of the impact of overriding these hooks so infinite epochs open the door to "memory leaks".
Pitch
Raise a warning in this case informing the user of this behaviour.
Additional context
Proposed in #11480 (comment)
If you enjoy Lightning, check out our other projects! ⚡
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
-
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
-
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @carmocca @awaelchli @ninginthecloud @daniellepintz @rohitgr7 @justusschock @kaushikb11