Skip to content

[roadmap] verl development Q2 #710

@eric-haibin-lin

Description

@eric-haibin-lin

Past roadmap for reference: #22

Agentic RL: Environment interaction & tool support [P0]

Scaling up RL & system performance [P0]

  • Ring Attention
  • Ulyssess sequence parallel for VLM models, e.g Qwen2VL
  • reference system tuning script for best RL throughput on different types of accelerators
  • multi-node rollout (potential inference engine dependency)
  • alignment loss fused kernels Feat/memory optimized loss #1212

Usability improvement

Latest Model & Algorithm Support

See https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html for adding models with FSDP backend
See https://verl.readthedocs.io/en/latest/advance/megatron_extension.html for adding models with Megatron backend.

Component Continuous Updates

dataset & benchmark

  • gpqa diamond (english)
  • LiveCodeBench (code)
  • SWE-bench Verified (code)
  • CNMO 2024 (math)
  • codecontests (Code Generation)
  • TACO (Code Generation)
  • competition_math (Math)

Please also help provide scripts to reproduce evaluation performance of public released models.

Efficient RL / codesign [P1]

Wide Hardware Coverage

Make the experience on non-nvidia GPUs more smooth

  • stable Ascend NPUs suppport, with reproducible examples and logs
  • stable AMD GPUs suppport, with sglang
  • AMD GPU with mcore support

Make verl easier to extend with custom train/infer engine and roles

other community requests

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions