[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537

bnellnm · 2025-08-08T18:11:50Z

Purpose

Delegate construction of FusedMoEQuantConfig objects to the subclass of FusedMoEMethodBase that will use that info.
Move all/most quantization info into FusedMoEQuantConfig and make it more uniform.
Replace quantization parameters of fused_experts with a FusedMoEQuantConfig. This eliminates the various use_ bool flags and quantization parameters _scales, _zp, _bias, _gscale, etc.

Test Plan

Run tests/kernels/moe
Run lm_eval on affected MoE models.

Test Result

TBD

(Optional) Documentation Update

TODO: update modular kernel doc

cc @varun-sundar-rabindranath , @LucasWilkinson , @jeejeelee , @wenscarl , @nvpohanh , @mgoin

gemini-code-assist

Code Review

This pull request introduces a significant refactoring of the Mixture of Experts (MoE) quantization configuration by introducing a new FusedMoEQuantConfig structure. This is a positive change towards a more structured and extensible configuration. However, the refactoring appears to be incomplete, as there are several critical issues, including assert False statements, NotImplementedErrors, and usage of undefined variables in the new code paths. These issues will cause runtime failures and need to be addressed before this PR can be considered for merging. My review focuses on these critical issues.

vllm/model_executor/layers/fused_moe/fused_moe.py

vllm/model_executor/layers/fused_moe/cutlass_moe.py

vllm/model_executor/layers/fused_moe/config.py

vllm/model_executor/layers/fused_moe/layer.py

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

github-actions · 2025-08-08T18:17:14Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-08-08T18:23:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-08-08T23:18:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-08-15T13:07:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <[email protected]>

…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <[email protected]>

minosfuture · 2025-09-19T22:14:57Z

is this tested with mxfp4? @bnellnm

The test result section is still TBD.

…2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <[email protected]> Co-authored-by: Yikun Jiang <[email protected]> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <[email protected]> Signed-off-by: MengqingCao <[email protected]> Co-authored-by: MengqingCao <[email protected]>

…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <[email protected]> Co-authored-by: Yikun Jiang <[email protected]> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <[email protected]> Signed-off-by: MengqingCao <[email protected]> Co-authored-by: MengqingCao <[email protected]>

…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <[email protected]> Co-authored-by: Yikun Jiang <[email protected]> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <[email protected]> Signed-off-by: MengqingCao <[email protected]> Co-authored-by: MengqingCao <[email protected]> Signed-off-by: Che Ruan <[email protected]>

mergify bot added documentation Improvements or additions to documentation ci/build rocm Related to AMD ROCm labels Aug 8, 2025

gemini-code-assist bot reviewed Aug 8, 2025

View reviewed changes

mergify bot added needs-rebase deepseek Related to DeepSeek models labels Aug 8, 2025

bnellnm force-pushed the new-moe-quant-config branch from 27a4513 to 688374b Compare August 8, 2025 22:30

mergify bot removed the needs-rebase label Aug 8, 2025

mergify bot added needs-rebase gpt-oss Related to GPT-OSS models labels Aug 8, 2025

bnellnm force-pushed the new-moe-quant-config branch 2 times, most recently from a6b4b30 to d5b12e8 Compare August 14, 2025 03:08

mergify bot removed the needs-rebase label Aug 14, 2025

mergify bot added the needs-rebase label Aug 15, 2025

bnellnm force-pushed the new-moe-quant-config branch 2 times, most recently from 328fc4c to ad0e7ff Compare August 15, 2025 20:49

mergify bot removed the needs-rebase label Aug 15, 2025

bnellnm marked this pull request as ready for review August 15, 2025 22:59

bnellnm requested review from tlrmchlsmth, WoosukKwon, yewentao256, mgoin and robertgshaw2-redhat as code owners August 15, 2025 22:59

bnellnm changed the title ~~New moe quant config~~ [Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses Aug 15, 2025

bnellnm force-pushed the new-moe-quant-config branch 2 times, most recently from d1f132f to 417e037 Compare August 17, 2025 17:23

bnellnm added 18 commits September 17, 2025 18:36

fix lint

7a04c4c

Signed-off-by: Bill Nell <[email protected]>

more lint

980f91e

Signed-off-by: Bill Nell <[email protected]>

fix test

939a4e4

Signed-off-by: Bill Nell <[email protected]>

zp -> bias

eb6385b

Signed-off-by: Bill Nell <[email protected]>

fixes

e91c977

Signed-off-by: Bill Nell <[email protected]>

fixes

0d9baa7

Signed-off-by: Bill Nell <[email protected]>

fix mxfp4 lint

c23c91f

Signed-off-by: Bill Nell <[email protected]>

fix flashinfer test

5d5e009

Signed-off-by: Bill Nell <[email protected]>

update other uses of moe quant configs

6a5453c

Signed-off-by: Bill Nell <[email protected]>

update other uses of moe quant configs

08d3a64

Signed-off-by: Bill Nell <[email protected]>

resolve merge conflicts

ef1c188

Signed-off-by: Bill Nell <[email protected]>

fix buffer problems

19909f6

Signed-off-by: Bill Nell <[email protected]>

fix another blackwell test

5bc186b

Signed-off-by: Bill Nell <[email protected]>

rebase

2082f16

Signed-off-by: Bill Nell <[email protected]>

fix merge

b19d0bc

Signed-off-by: Bill Nell <[email protected]>

fix merge

2de3019

Signed-off-by: Bill Nell <[email protected]>

fix merge

469aaee

Signed-off-by: Bill Nell <[email protected]>

fix more merge issues

8d94b93

Signed-off-by: Bill Nell <[email protected]>

bnellnm force-pushed the new-moe-quant-config branch from 5f60537 to 8d94b93 Compare September 17, 2025 18:40

mgoin merged commit 5963b98 into vllm-project:main Sep 17, 2025
59 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 17, 2025

panpan0000 pushed a commit to panpan0000/vllm that referenced this pull request Sep 18, 2025

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMeth…

d66b704

…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <[email protected]>

MengqingCao mentioned this pull request Sep 19, 2025

[CI] Upgrade vLLM to 20250919 (6d8246aa) and fix some broken issue vllm-project/vllm-ascend#2907

Merged

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMeth…

e0f0a92

…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <[email protected]>

Yikun mentioned this pull request Sep 22, 2025

[Bug]: Fix vllm main issue (0922) vllm-project/vllm-ascend#3083

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537

Uh oh!

bnellnm commented Aug 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 15, 2025

Uh oh!

Uh oh!

minosfuture commented Sep 19, 2025

Uh oh!

Uh oh!

Uh oh!

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537

Uh oh!

Conversation

bnellnm commented Aug 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 8, 2025

Uh oh!

mergify bot commented Aug 15, 2025

Uh oh!

Uh oh!

minosfuture commented Sep 19, 2025

Uh oh!

Uh oh!

bnellnm commented Aug 8, 2025 •

edited by github-actions bot

Loading