Skip to content

quantize-ort.py doesn't reproduce the quantized models in the repos #239

@Tim-Siu

Description

@Tim-Siu

Environment: onnxruntime version 1.17.1, model PP-ResNet50.

Running the quantization script quantize-ort.py cannot reproduce the quantized model in the repo. The current script will produce int8 quantized ppresnet50 at a size of over 120 MB, which is significantly different from the existing quantized models in the repo at the size of ~26 MB. After some investigation, I think the reason might be that preprocessing is missing. The ONNX documentation seems to suggest preprocessing is highly encouraged.

Left: Computation graph of already quantized models in the repo or models quantized by the updated script.
Right: Computation graph of the model quantized by the original script.

Screenshot 2024-02-26 at 23 15 36

We can see that the current script will result in a model with an unoptimized computation graph and redundant computation nodes.

Metadata

Metadata

Assignees

Labels

quantizationAnything related to model quantization

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions