Skip to content

Conversation

@ruro
Copy link
Contributor

@ruro ruro commented Nov 21, 2025

I added support for exporting QuantizeLinear/DequantizeLinear nodes (from fake_quantize_per_*_affine torch operators) in a previous PR.

Unfortunately, the current default onnxscript optimizer settings tend to automatically remove any weight quantization. This is because the Weight -> QDQ -> ... pattern looks like it can be just constant folded to QDQ(Weight) -> ....

I believe that this behavior is not desirable, since the presence of QDQ nodes in the graph is what allows inference engines to run the supported computations using quantized data types. So the purpose of QDQ nodes is to hold the relevant quantization "metadata". As such, they normally shouldn't be constant folded.

I have extended the existing logic in FoldConstantsPass that was used to exclude ConstantOfShape from constant folding.

I haven't found any tests verifying this behavior for ConstantOfShape and I'm not sure, how to set up such a unit test, so I have left this code untested for now. If adding tests is mandatory, please give me a hint on where should I add such a test and what would be the best way to check/assert that the optimized graph matches the expectations (hopefully without reinventing the wheel or manually introspecting the ir.Model object).

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.11%. Comparing base (3e7d9fb) to head (0d6a91b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2713   +/-   ##
=======================================
  Coverage   70.11%   70.11%           
=======================================
  Files         226      226           
  Lines       27228    27230    +2     
  Branches     2747     2748    +1     
=======================================
+ Hits        19090    19092    +2     
  Misses       7193     7193           
  Partials      945      945           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@titaiwangms titaiwangms added the module: torchlib Related to the torch/aten function lib in development label Dec 1, 2025
@titaiwangms titaiwangms added module: optimizer and removed module: torchlib Related to the torch/aten function lib in development labels Dec 1, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents constant folding of QuantizeLinear and DequantizeLinear nodes by default to preserve quantization metadata that inference engines need for optimized execution. The change extends the existing blacklist mechanism that was previously used only for ConstantOfShape.

Key Changes:

  • Introduced DEFAULT_CONSTANT_FOLD_BLACKLIST constant containing ops that should not be constant folded
  • Refactored the constant folding logic to iterate over the blacklist instead of checking a single op type

Copy link
Contributor

@titaiwangms titaiwangms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinchuby @gramalingam @xadupre Merging this as it seems pretty reasonable to me and the fix is simple. We can re-visit this PR if it obviously breaks anything and it's spotted in the future (observing from benchmark perhaps?).

@titaiwangms titaiwangms enabled auto-merge (squash) December 1, 2025 21:27
@titaiwangms titaiwangms merged commit bbe9c2b into microsoft:main Dec 3, 2025
32 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in ONNX Script Review Board Dec 3, 2025
@justinchuby
Copy link
Collaborator

Prefer a more efficient way for checking membership: turn the list to a frozen set, first check that the domain is an onnx domain ("" or "ai.onnx"), then check if the op_type is in the set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants