Skip to content

[Roadmap] Primus-Turbo Roadmap H2 2025 #101

@xiaobochen-amd

Description

@xiaobochen-amd

This roadmap is the H2 2025 development plan of Primus-Turbo.

Note: The roadmap is flexible and will be updated over time based on project needs and community input.

Release Overview

Version Framework Status Date
v0.1.0 PyTorch + ROCm6.4 ✅ Released 2025-09-11
v0.1.1 PyTorch + ROCm7.0 ✅ Released 2025-10-15
v0.2.0 PyTorch + ROCm7.1 ✅ Released 2025-12-05

Detailed Plans

v0.1.0 (Released)

Focus

  • Build the foundational framework of Primus-Turbo.
  • Provide core operators.

Features

  • GEMM: Support FP16/BF16.
  • FlashAttention: Support FP16/BF16.
  • GroupedGEMM: Support FP16/BF16.

Famework

  • Provide PyTorch APIs
  • Support ROCm 6.4

v0.2.0 (Released)

Focus

  • Introduce FP8 foundational support.
  • Enable communication primitives with FP8, focusing on DeepEP.

Features

  • GEMM: Support FP8 (E4M3/E5M2).
    • Support Tensorwise.
    • Support Rowwise.
    • Support Blockwise.
    • Support MX
  • FlashAttention: Support FP8 (E4M3/E5M2).
    • Support Blockwise.
  • GroupedGEMM: Support FP8 (E4M3/E5M2).
    • Support Tensorwise.
    • Support Rowwise.
    • Support Blockwise.
    • Support MX
  • All2All: FP8 support.
    • Support Tensorwise.
  • DeepEP:
    • Intra-Node Normal Kernel.
    • Inter-Node Normal Kernel.
    • Support NICs.
      • ConnectX-7
      • Thor2
      • Pensando
    • Support internode_dispatch GPU-CPU no sync.
    • Support torch.compile
  • TokenDispatcher:
    • Integrate Permute/Unpermute
    • Support Sync-Free DeepEPTokenDispatcher
    • Support MoE Fused Activations.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions