You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Registry interface for custom quantization functional backend (#683)
## What does this PR do?
**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->
**Overview:**
Add registry interface for custom quantization functional backend
## Usage
<!-- You can potentially add a usage example below. -->
see `tests/unit/torch/quantization/test_custom_backend.py` for usage
example.
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes
## Additional Information
<!-- E.g. related issue. -->
Signed-off-by: realAsma <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ NVIDIA Model Optimizer Changelog (Linux)
12
12
- Add support for KV Cache Quantization for vLLM FakeQuant PTQ script. See `examples/vllm_serve/README.md <https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/vllm_serve#Calibrate-and-serve-fake-quant-model-in-vLLM>`__ for more details.
13
13
- Add support for subgraphs in ONNX autocast.
14
14
- Add support for parallel draft heads in Eagle speculative decoding.
15
+
- Add support to enable custom emulated quantization backend. See :meth:`register_quant_backend <modelopt.torch.quantization.nn.modules.tensor_quantizer.register_quant_backend>`` for more details. See an example in ``tests/unit/torch/quantization/test_custom_backend.py``.
0 commit comments