[megatron] fix: add backward compatibility with older Megatron-Bridge versions#6682
[megatron] fix: add backward compatibility with older Megatron-Bridge versions#6682nuerxiati wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates verl/models/mcore/bridge.py to provide fallback implementations for LinearForLastLayer, make_value_model, and freeze_moe_router in case they cannot be imported from megatron.bridge.training.utils.train_utils. This ensures compatibility and robustness when those specific utilities are not available. There are no review comments, and I have no additional feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| @@ -15,12 +15,91 @@ | |||
|
|
|||
| try: | |||
There was a problem hiding this comment.
releases/v0.8.0是否也可以增加该版本兼容PR
| raise | ||
|
|
||
| try: | ||
| from megatron.bridge.training.utils.train_utils import LinearForLastLayer, freeze_moe_router, make_value_model |
| @@ -15,12 +15,85 @@ | |||
|
|
|||
| try: | |||
| from megatron.bridge import AutoBridge | |||
There was a problem hiding this comment.
原pr中from megatron.bridge.models.conversion.param_mapping import AutoMapping,AutoMapping.register_module_type("LinearForLastLayer", "replicated")这段代码确认一下是否不需要
| ): | ||
| layer.mlp.shared_experts.gate_bias.requires_grad = False | ||
| return model | ||
|
|
| # 3. The distributed optimizer must only track trainable (adapter) parameters | ||
| # See Megatron-Bridge docs: training/peft.md | ||
|
|
||
| # Register PEFT transformation as pre-wrap hook if peft_cls is specified |
There was a problem hiding this comment.
这段注释为啥删除了,尽量不要影响gpu相关逻辑和注释
| ddp_config = None | ||
| if wrap_config.wrap_with_ddp: | ||
| try: | ||
| from megatron.bridge.training.utils.config_utils import create_ddp_config |
5bd697c to
9554d40
Compare
| raise | ||
|
|
||
| # Megatron-Bridge >= v0.5.0 exposes these symbols in train_utils. | ||
| # Megatron-Bridge < v0.5.0 does not, so we fall back to local implementations. |
There was a problem hiding this comment.
Why is compatibility with the legacy version of megatron-bridge still required?
What does this PR do?
After #6335 (
bc3f3bf0), verl adopted several new Megatron-Bridge APIs that do not exist in older versions. Using the latest verl with an older Megatron-Bridge causesImportErrorandAttributeErrorat runtime.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.