Skip to content

Enable using different MoE implementations in GraniteMoE#46526

Open
mbknust wants to merge 2 commits into
huggingface:mainfrom
mbknust:GraniteMoE-via-Mixtral
Open

Enable using different MoE implementations in GraniteMoE#46526
mbknust wants to merge 2 commits into
huggingface:mainfrom
mbknust:GraniteMoE-via-Mixtral

Conversation

@mbknust

@mbknust mbknust commented Jun 9, 2026

Copy link
Copy Markdown

Changes

I've changed models/granitemoe/modular_granitemoe.py, so that it uses MixtralExperts and MixtralTopKRouter instead of JetMoeParallelExperts and JetMoeTopKGating. From what I've seen, almost all MoE models have implementations based on Mixtral. Since MixtralExperts uses the @use_experts_implementation decorator, this PR will enable switching out the MoE module implementation inside GraniteMoE as well. The change also affects the two derived architectures: granitemoeshared and granitemoehybrid.

Since Mixtral's parameters have different names, I've had to add some renamings to src/transformers/conversion_mapping.py.

With the new gating module, the original reason for _can_compile_fullgraph = False no longer applies, so I've set it to True, except for granitemoehybrid, where I've had to keep it False, due to some unrelated problem with that model. I've also set _can_record_outputs.

Since one of the MoE implementations that can be chosen is based on the grouped_mm operator, which requires that strides be 16-byte aligned, I've changed the model sizes in the test file to be multiples of 8 (see also #42697).

  • I confirm that this is not a pure code agent PR.
  • Did you read the contributor guideline, Pull Request section?

Who can review?

@IlyasMoutawwakil

mbknust and others added 2 commits June 9, 2026 17:49
This means we can switch out the expert module implementation.
@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoe, granitemoehybrid, granitemoeshared

@github-actions

Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46526&sha=6f0df7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant