Skip to content

Fix dtype mismatch in NemotronH/Zamba2 Mamba2 CUDA-kernel path (out_proj)#46487

Merged
Rocketknight1 merged 1 commit into
huggingface:mainfrom
yuekaizhang:fix_dtype
Jun 8, 2026
Merged

Fix dtype mismatch in NemotronH/Zamba2 Mamba2 CUDA-kernel path (out_proj)#46487
Rocketknight1 merged 1 commit into
huggingface:mainfrom
yuekaizhang:fix_dtype

Conversation

@yuekaizhang

Copy link
Copy Markdown
Contributor

Summary

The Mamba2 SSM CUDA-kernel forward path (cuda_kernels_forward) projects the scan output
through out_proj without casting it to the projection weight dtype. Because the SSM kernels
(mamba_chunk_scan_combined / selective_state_update) return fp32 regardless of the module
dtype
, the projection fails whenever out_proj.weight is a narrower dtype than the activations:

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

Fix

Cast the projection input to the out_proj weight dtype before both explicit out_proj calls in
the CUDA-kernel path, so the matmul dtypes always agree (matching the slow path's intent, and
robust to mixed-dtype models):

out = self.out_proj(scan_output.to(self.out_proj.weight.dtype))

NemotronHMamba2Mixer inherits this method from Zamba2MambaMixer, so the fix is applied in the
modular source models/zamba2/modular_zamba2.py and propagated to the generated
modeling_zamba2.py and modeling_nemotron_h.py.

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: nemotron_h, zamba2

@Rocketknight1 Rocketknight1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LGTM! Seems like a clean fix, and should be low risk because it's at worst a no-op.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

@Rocketknight1 Rocketknight1 enabled auto-merge June 8, 2026 14:38
@Rocketknight1 Rocketknight1 added this pull request to the merge queue Jun 8, 2026
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Merged via the queue into huggingface:main with commit 7ac9912 Jun 8, 2026
23 checks passed
khushali9 pushed a commit to khushali9/transformers that referenced this pull request Jun 8, 2026
…proj`) (huggingface#46487)

fix dtype mismatch

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
…proj`) (huggingface#46487)

fix dtype mismatch

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
louzongzhi pushed a commit to louzongzhi/transformers that referenced this pull request Jun 10, 2026
…proj`) (huggingface#46487)

fix dtype mismatch

Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants