Introduce Preprocessing for Optimized Quantization in quantize-ort.py
#238
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issues
Resolve #239
Running the quantization script
quantize-ort.py
cannot reproduce the quantized model in the repo. The current script will produce int8 quantized ppresnet50 at a size of over 120 MB, which is significantly different from the existing quantized models in the repo at the size of ~26 MB. After some investigation, I think the reason might be that preprocessing is missing. The ONNX documentation seems to suggest preprocessing is highly encouraged.Left: Computation graph of already quantized models in the repo or models quantized by the updated script.

Right: Computation graph of the model quantized by the original script.
We can see that the current script will result in a model with an unoptimized computation graph and redundant computation nodes.
Key Changes
quantize-ort.py
. Optimization is automatically carried out by thequant_pre_process
method.Expected Benefits