-
Notifications
You must be signed in to change notification settings - Fork 251
Description
Environment: onnxruntime version 1.17.1, model PP-ResNet50
.
Running the quantization script quantize-ort.py
cannot reproduce the quantized model in the repo. The current script will produce int8 quantized ppresnet50 at a size of over 120 MB, which is significantly different from the existing quantized models in the repo at the size of ~26 MB. After some investigation, I think the reason might be that preprocessing is missing. The ONNX documentation seems to suggest preprocessing is highly encouraged.
Left: Computation graph of already quantized models in the repo or models quantized by the updated script.
Right: Computation graph of the model quantized by the original script.
We can see that the current script will result in a model with an unoptimized computation graph and redundant computation nodes.