Enable using different MoE implementations in GraniteMoE by mbknust · Pull Request #46526 · huggingface/transformers

mbknust · 2026-06-09T15:50:35Z

Changes

I've changed models/granitemoe/modular_granitemoe.py, so that it uses MixtralExperts and MixtralTopKRouter instead of JetMoeParallelExperts and JetMoeTopKGating. From what I've seen, almost all MoE models have implementations based on Mixtral. Since MixtralExperts uses the @use_experts_implementation decorator, this PR will enable switching out the MoE module implementation inside GraniteMoE as well. The change also affects the two derived architectures: granitemoeshared and granitemoehybrid.

Since Mixtral's parameters have different names, I've had to add some renamings to src/transformers/conversion_mapping.py.

With the new gating module, the original reason for _can_compile_fullgraph = False no longer applies, so I've set it to True, except for granitemoehybrid, where I've had to keep it False, due to some unrelated problem with that model. I've also set _can_record_outputs.

Since one of the MoE implementations that can be chosen is based on the grouped_mm operator, which requires that strides be 16-byte aligned, I've changed the model sizes in the test file to be multiples of 8 (see also #42697).

I confirm that this is not a pure code agent PR.
Did you read the contributor guideline, Pull Request section?

Who can review?

@IlyasMoutawwakil

This means we can switch out the expert module implementation.

github-actions · 2026-06-12T08:00:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoe, granitemoehybrid, granitemoeshared

github-actions · 2026-06-12T08:13:40Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46526&sha=6f0df7

mbknust and others added 2 commits June 9, 2026 17:49

Base GraniteMoE on Mixtral

883d646

This means we can switch out the expert module implementation.

Merge branch 'huggingface:main' into GraniteMoE-via-Mixtral

6f0df7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable using different MoE implementations in GraniteMoE#46526

Enable using different MoE implementations in GraniteMoE#46526
mbknust wants to merge 2 commits into
huggingface:mainfrom
mbknust:GraniteMoE-via-Mixtral

mbknust commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbknust commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Who can review?

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mbknust commented Jun 9, 2026 •

edited

Loading