[TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare #5570

WeiHaocheng · 2025-06-29T10:26:18Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Fred Wei <[email protected]>

WeiHaocheng · 2025-06-29T12:54:23Z

/bot run

tensorrt-cicd · 2025-06-29T12:59:57Z

PR_Github #10235 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-29T17:05:25Z

PR_Github #10235 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7563 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Fred Wei <[email protected]>

) Signed-off-by: Fred Wei <[email protected]> Co-authored-by: WeiHaocheng <[email protected]>

## 📌 Description This PR adds `mnnvl_moe_alltoallv_prepare_without_allgather` from [TensorRT-LLM](NVIDIA/TensorRT-LLM#5570). This is a more efficient way to prepare alltoallv info. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests ``` tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-2-16-20-8-512] PASSED [ 69%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-2-16-16-3-300] PASSED [ 73%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-4-20-24-8-4000] PASSED [ 78%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-8-96-96-8-1000] PASSED [ 82%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[3-8-128-128-8-1000] PASSED [ 86%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[3-8-128-144-8-1] PASSED [ 91%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-4-72-80-4-2256] PASSED [ 95%] tests/test_trtllm_alltoall.py::test_moe_alltoall_prepare[0-4-72-80-6-3333] PASSED [100%] ``` ## Reviewer Notes

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare

a9048fe

Signed-off-by: Fred Wei <[email protected]>

WeiHaocheng self-assigned this Jun 29, 2025

WeiHaocheng marked this pull request as ready for review June 29, 2025 10:26

WeiHaocheng requested a review from a team as a code owner June 29, 2025 10:26

WeiHaocheng requested review from yilin-void, HuiGao-NV and dongxuy04 June 29, 2025 10:26

Merge branch 'main' into feat/main_moe_prepare

1bcc2b5

juney-nvidia approved these changes Jun 30, 2025

View reviewed changes

juney-nvidia merged commit 42a9385 into NVIDIA:main Jun 30, 2025
3 checks passed

juney-nvidia changed the title ~~[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare~~ [TRTLLM-5331] large-scale EP: perf - Replace allgaher with AllToAllPrepare Jun 30, 2025

ameynaik-hub pushed a commit to ameynaik-hub/TensorRT-LLM that referenced this pull request Jun 30, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

313e6ef

Signed-off-by: Fred Wei <[email protected]>

Shunkangz pushed a commit to Shunkangz/TensorRT-LLM that referenced this pull request Jul 2, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

927781e

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

5f1efa6

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

277e584

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

d9edfac

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

94d0ca2

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

6f2a772

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

4f529e4

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

d540968

Signed-off-by: Fred Wei <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

261870d

Signed-off-by: Fred Wei <[email protected]>

nvzhihanj pushed a commit to nvzhihanj/TensorRT-LLM that referenced this pull request Jul 17, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (NVIDIA#5570)

82923ff

Signed-off-by: Fred Wei <[email protected]>

nvzhihanj added a commit that referenced this pull request Jul 22, 2025

[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570) (#6124

18c0333

) Signed-off-by: Fred Wei <[email protected]> Co-authored-by: WeiHaocheng <[email protected]>

kaiyux changed the title ~~[TRTLLM-5331] large-scale EP: perf - Replace allgaher with AllToAllPrepare~~ [TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare Jul 29, 2025

trevor-m mentioned this pull request Aug 22, 2025

Add mnnvl_moe_alltoallv_prepare_without_allgather flashinfer-ai/flashinfer#1550

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare #5570

[TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare #5570

Uh oh!

WeiHaocheng commented Jun 29, 2025

Uh oh!

WeiHaocheng commented Jun 29, 2025

Uh oh!

tensorrt-cicd commented Jun 29, 2025

Uh oh!

tensorrt-cicd commented Jun 29, 2025

Uh oh!

Uh oh!

Uh oh!

[TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare #5570

[TRTLLM-5331] large-scale EP: perf - Replace allgather with AllToAllPrepare #5570

Uh oh!

Conversation

WeiHaocheng commented Jun 29, 2025

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

WeiHaocheng commented Jun 29, 2025

Uh oh!

tensorrt-cicd commented Jun 29, 2025

Uh oh!

tensorrt-cicd commented Jun 29, 2025

Uh oh!

Uh oh!

Uh oh!