Inference tutorial - Part 3 of e2e series #2343

jainapurva · 2025-06-09T23:18:31Z

Last tutorial of the 3 part series of using TorchAO in model lifecycle.

pytorch-bot · 2025-06-09T23:18:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2343

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Cancelled Jobs

As of commit ccc2932 with merge base 2898903 ():

NEW FAILURE - The following job has failed:

Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
Process completed with exit code 127.

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

docs/source/inference.rst

andrewor14 · 2025-06-17T21:48:34Z

Hi @jainapurva, by the way I'm adding a serving.rst here: #2394. It uses the same template as parts 1 and 2. After that's landed, do you mind updating your PR to use that file instead? Right now it's a blank page with the template:

docs/source/inference.rst

jerryzh168 · 2025-06-18T23:51:43Z

docs/source/inference.rst

+.. note::
+    For more information on supported quantization and sparsity configurations, see `HF-Torchao Docs <https://huggingface.co/docs/transformers/main/en/quantization/torchao>`_.
+
+Inference with vLLM


for this section, can you replace with https://huggingface.co/pytorch/Qwen3-8B-int4wo-hqq#inference-with-vllm

it might be easier to do command line compared to code

docs/source/serving.rst

andrewor14

Looks great, overall I feel we should add some more text in between code blocks so it feels more like a tutorial, and remove some duplicate code, which is distracting to readers

docs/source/serving.rst

andrewor14 · 2025-06-26T20:50:25Z

docs/source/serving.rst

+Step 1: Untie Embedding Weights
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+We want to quantize the embedding and lm_head differently. Since those layers are tied, we first need to untie the model:


Is this step necessary actually? I don't think I had to do any of this for Llama models for example. Can you share the source for this?

I'm using the same steps as here: https://huggingface.co/pytorch/Phi-4-mini-instruct-8da4w. In case of any updates, we should update both the model card and tutorial with same instructions

docs/source/serving.rst

Preliminary structure for tutorial

c0584b4

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2025

jainapurva added the topic: documentation Use this tag if this PR adds or improves documentation label Jun 10, 2025

jainapurva and others added 8 commits June 16, 2025 09:59

Updates

f4e8f2d

Update

7c2332e

Update

942a02b

Update

888fd4c

Update

c200cd2

Merge remote-tracking branch 'origin/main' into inference_tutorial

4f76b23

Update

c52e6f8

Update

de160b1

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva added 2 commits June 17, 2025 12:11

Update

e8f5e53

Update

bbd567d

jainapurva commented Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva requested review from jerryzh168, andrewor14, drisspg and jcaip June 17, 2025 20:42

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

Update notes

6a96697

jerryzh168 reviewed Jun 17, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jcaip reviewed Jun 18, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jainapurva added 3 commits June 18, 2025 12:11

Updates

06612d3

Merge remote-tracking branch 'origin/main' into inference_tutorial

a3aa301

Updates

ce675b8

jainapurva force-pushed the inference_tutorial branch from b93b892 to ce675b8 Compare June 18, 2025 21:05

jerryzh168 reviewed Jun 18, 2025

View reviewed changes

docs/source/inference.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed Jun 18, 2025

View reviewed changes

drisspg reviewed Jun 23, 2025

View reviewed changes

docs/source/serving.rst Outdated Show resolved Hide resolved

jainapurva and others added 7 commits June 23, 2025 11:01

Updates

0311bc0

Updates to build torchao

2c44d25

Merge remote-tracking branch 'origin/main' into inference_tutorial

b163ef7

Updates to vllm serving

580a99c

Updates to vllm serving

17b7cb8

Fix formatting

bd2600f

Fix formatting issues

52b93fe

andrewor14 approved these changes Jun 26, 2025

View reviewed changes

jainapurva added 3 commits July 1, 2025 11:10

Updated wrt review

6b178ab

Update formatting

ccc2932

Fix spelling and minor error

1736046

jainapurva marked this pull request as ready for review July 1, 2025 19:59

jainapurva changed the title ~~Inference tutorial - Part 3 of e2e series [WIP]~~ Inference tutorial - Part 3 of e2e series Jul 1, 2025

minor fixes

c16d2a6

jainapurva merged commit 09f0d6c into main Jul 1, 2025
19 of 21 checks passed

Inference tutorial - Part 3 of e2e series #2343

Inference tutorial - Part 3 of e2e series #2343

Uh oh!

Conversation

jainapurva commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2343

❌ 1 New Failure, 2 Cancelled Jobs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewor14 commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

jerryzh168 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewor14 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

jainapurva Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jainapurva commented Jun 9, 2025 •

edited

Loading

pytorch-bot bot commented Jun 9, 2025 •

edited

Loading