Skip to content

gradient_accumulation_steps influences scale of the loss #2707

@RuABraun

Description

@RuABraun

🐛 Bug description

Because of the division here, and similar functions, the loss has the wrong scale when using gradient_accumulation_steps. This makes it confusingly low in comparison to the valid loss.

One option is to only divide on the backward call, i.e. doing this:

scaler.scale(loss / gradient_accumulation_steps).backward()

or one could multiply the loss again with gradient_accumulation_steps before returning it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions