Skip to content

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Dec 12, 2025

We need to pass the dataloader even if we're not loading it from a checkpoint, since it gets passed through as none otherwise.

Signed-off-by: Peter St. John <[email protected]>
@pstjohn pstjohn added this pull request to the merge queue Dec 12, 2025
ckpt_path=ckpt_path,
dist_config=dist_config,
dataloader=train_dataloader if args.dataset.use_stateful_dataloader else None,
dataloader=train_dataloader,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think esm2's train_fsdp2 also has this code - not sure if it needs to be fixed there too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch if true

Merged via the queue into NVIDIA:main with commit 8f89ee0 Dec 12, 2025
14 checks passed
@pstjohn pstjohn deleted the pstjohn/dataloader-fsdp2-fix branch December 12, 2025 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants