-
Notifications
You must be signed in to change notification settings - Fork 98
Don't constant fold Quantize/DequantizeLinear nodes by default #2713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2713 +/- ##
=======================================
Coverage 70.11% 70.11%
=======================================
Files 226 226
Lines 27228 27230 +2
Branches 2747 2748 +1
=======================================
+ Hits 19090 19092 +2
Misses 7193 7193
Partials 945 945 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR prevents constant folding of QuantizeLinear and DequantizeLinear nodes by default to preserve quantization metadata that inference engines need for optimized execution. The change extends the existing blacklist mechanism that was previously used only for ConstantOfShape.
Key Changes:
- Introduced
DEFAULT_CONSTANT_FOLD_BLACKLISTconstant containing ops that should not be constant folded - Refactored the constant folding logic to iterate over the blacklist instead of checking a single op type
titaiwangms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinchuby @gramalingam @xadupre Merging this as it seems pretty reasonable to me and the fix is simple. We can re-visit this PR if it obviously breaks anything and it's spotted in the future (observing from benchmark perhaps?).
|
Prefer a more efficient way for checking membership: turn the list to a frozen set, first check that the domain is an onnx domain ("" or "ai.onnx"), then check if the op_type is in the set. |
I added support for exporting
QuantizeLinear/DequantizeLinearnodes (fromfake_quantize_per_*_affinetorch operators) in a previous PR.Unfortunately, the current default onnxscript optimizer settings tend to automatically remove any weight quantization. This is because the
Weight -> QDQ -> ...pattern looks like it can be just constant folded toQDQ(Weight) -> ....I believe that this behavior is not desirable, since the presence of
QDQnodes in the graph is what allows inference engines to run the supported computations using quantized data types. So the purpose ofQDQnodes is to hold the relevant quantization "metadata". As such, they normally shouldn't be constant folded.I have extended the existing logic in
FoldConstantsPassthat was used to excludeConstantOfShapefrom constant folding.I haven't found any tests verifying this behavior for
ConstantOfShapeand I'm not sure, how to set up such a unit test, so I have left this code untested for now. If adding tests is mandatory, please give me a hint on where should I add such a test and what would be the best way to check/assert that the optimized graph matches the expectations (hopefully without reinventing the wheel or manually introspecting their.Modelobject).