Skip to content

[fully_async, trainer] fix: sync optimizer total steps before trainer initialization#6684

Open
mikequan0425 wants to merge 2 commits into
verl-project:mainfrom
mikequan0425:async_optim_lr_fix
Open

[fully_async, trainer] fix: sync optimizer total steps before trainer initialization#6684
mikequan0425 wants to merge 2 commits into
verl-project:mainfrom
mikequan0425:async_optim_lr_fix

Conversation

@mikequan0425

@mikequan0425 mikequan0425 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

It was observed in experiments that the learning rate (lr) was always 0. This issue does not occur when starting the script via main_ppo, and only emerges under fully async mode. Refer to the following for detailed problem description:#6683

In short, optim.total_training_steps is not assigned correctly during the initialization of trainer optim.

image

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

The simplest approach is to pass values through by configuring actor_rollout_ref.actor.optim.total_training_steps, yet the actual trainer step should be calculated as total_rollout_steps / (required_samples * trigger_parameter_sync_step)

Therefore, this PR provides a method to assign the corresponding value to optim.total_training_steps when creating the trainer.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: https://github.com/verl-project/verl/pulls?q=is%3Apr+is%3Aopen+fully+async+lr
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors FullyAsyncTrainer to resolve and set the total training steps in the configuration prior to worker initialization. It extracts the configuration-setting logic into _set_total_training_steps_in_config, adds a helper method _resolve_total_training_steps_before_init to compute the steps dynamically, and invokes this resolution during init_workers. There are no review comments, and no additional feedback is provided.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Luosuu Luosuu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found two issues in the pre-init total step calculation that should be addressed before this is safe.


required_samples = (
self.config.actor_rollout_ref.actor.ppo_mini_batch_size * self.config.async_training.require_batches
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optim.total_training_steps is consumed by the LR scheduler, and the scheduler is stepped on every actor update (_fit_update_actor -> update_actor), not only when parameters are synced. Dividing by trigger_parameter_sync_step makes the schedule finish trigger_parameter_sync_step times too early; if the progress bar/checkpoint version wants sync steps, please keep that separate from the optimizer scheduler steps.

@@ -266,7 +271,16 @@ def set_total_train_steps(self, total_training_steps):
except Exception as e:
print(f"Warning: Could not set total_training_steps in config. Structure missing? Error: {e}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the raw configured rollout.total_rollout_steps, but the rollouter later computes the effective count as min(config.rollout.total_rollout_steps, len(train_dataloader) * total_epochs) and also supports None. Since the optimizer is already constructed during init_workers, the later set_total_train_steps() call cannot rebuild the scheduler, so dataset-limited or unset configs still get the wrong LR schedule here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants