[ort_fusuion] Support fp16 in rms_norm fusion #2491

titaiwangms · 2025-08-14T22:09:56Z

In RMSNorm, there are compute_type and target_type, which we run the computation on compute_type and then convert it back to target_type after RMSNorm.

Typical example can be found in RMSNorm class in LLMs, like in GPT-OSS: https://github.com/huggingface/transformers/blob/52c6c1bb6e27ca87c4faede34a4c2a7404c17c4d/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L54

Therefore, we need to take op.Cast into pattern consideration.

codecov · 2025-08-14T22:13:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.09%. Comparing base (7407431) to head (2180b0f).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2491   +/-   ##
=======================================
  Coverage   70.08%   70.09%           
=======================================
  Files         212      212           
  Lines       25646    25647    +1     
  Branches     2573     2573           
=======================================
+ Hits        17973    17976    +3     
+ Misses       6783     6781    -2     
  Partials      890      890

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gramalingam · 2025-08-15T15:12:41Z

onnxscript/rewriter/ort_fusions/rms_normalization.py

        normalized = op.Mul(x, reciprocal_rms)
        normalized = pattern.OrValue([op.Cast(normalized, to=target_dtype), normalized])
+        # To support float16, we need to ensure the scale is casted or not.
+        scale = pattern.OrValue([op.Cast(scale, to=compute_dtype), scale])


I don't think this should be to=compute_dtype ... it may be better to make it a different variable, say scale_cast_type. And check for correctness in the check condition below. It should basically be the target-dtype, since it is the type of the final output ... the compute-dtype could be something different.

But we need to check the spec of SimplifiedLayerNorm in ORT to ensure that we are providing a scale value that has a consistent type.

But I am confused about the issue you ran into ... I think the fusion should have happened anyway ... are you trying to eliminate any redundant type cast or something?

Specifically: what are the different types in the failing case? What is the input type, computation type, and output type? Is the scale type float32? Is it being cast to fp16?

Link to ORT's op definition for reference: https://github.com/microsoft/onnxruntime/blob/cb0c5e9001cd3510ceb25173453373e4f1c7ab09/onnxruntime/core/graph/contrib_ops/contrib_defs.cc#L3079 ...

While reviewing RMSNormalization pattern match, I recall I forgot to answer this. I ran into an issue that the model specifically turn the dtype to float32 for calculations and then cast it back to whatever its original inputs was.

https://github.com/huggingface/transformers/blob/52c6c1bb6e27ca87c4faede34a4c2a7404c17c4d/src/transformers/models/gpt_oss/modeling_gpt_oss.py#L54

support rms_norm fp16

2180b0f

github-project-automation bot added this to ONNX Script Review Board Aug 14, 2025

github-project-automation bot moved this to Todo in ONNX Script Review Board Aug 14, 2025

titaiwangms requested a review from gramalingam August 14, 2025 22:10

titaiwangms enabled auto-merge (squash) August 14, 2025 23:09

justinchuby approved these changes Aug 15, 2025

View reviewed changes

titaiwangms merged commit 700bb1a into microsoft:main Aug 15, 2025
25 of 32 checks passed

github-project-automation bot moved this from Todo to Done in ONNX Script Review Board Aug 15, 2025

gramalingam reviewed Aug 15, 2025

View reviewed changes

justinchuby added the module: rewriter label Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ort_fusuion] Support fp16 in rms_norm fusion #2491

[ort_fusuion] Support fp16 in rms_norm fusion #2491

Uh oh!

titaiwangms commented Aug 14, 2025

Uh oh!

codecov bot commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

gramalingam Aug 15, 2025

Uh oh!

gramalingam Aug 15, 2025

Uh oh!

gramalingam Aug 15, 2025

Uh oh!

gramalingam Aug 15, 2025

Uh oh!

titaiwangms Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ort_fusuion] Support fp16 in rms_norm fusion #2491

[ort_fusuion] Support fp16 in rms_norm fusion #2491

Uh oh!

Conversation

titaiwangms commented Aug 14, 2025

Uh oh!

codecov bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

gramalingam Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

gramalingam Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

gramalingam Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

gramalingam Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

titaiwangms Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Aug 14, 2025 •

edited

Loading