[Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer #98

h-guo18 · 2025-07-19T00:01:53Z

GH Issue #4403 [refactor] Move pattern matching transforms to new InferenceOptimizer

Description

Moved the following transformation into the new configurable inference optimizer:
- quantize
- moe
- KVCache
- ROPE
Updated unit test of the corresponding transforms to use the new inference optimizer.

Test Coverage

Unite tests. See changed files.

Signed-off-by: haoguo <[email protected]>

lucaslie

For any transform that we move we should:

remove the corresponding transform from the old InferenceOptimizer
configure the default settings in auto_deploy/config/default.yaml

Once we have finalized the reviews, would you be comfortable submitting one PR per transform? (or at least one PR per a couple of transforms). This well help with tracking potential regressions in case we face any

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_moe.py

lucaslie · 2025-07-22T00:58:43Z

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_moe.py

+
+        # TODO:(hg) confirm this
+        info = TransformInfo(
+            skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=True


Suggested change

skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=True

skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=False

This is safer unless we know that the transform correctly assigns and updates shapes.

@Fridah-nv can also comment on that

I agree on this. I think all the transformations should be able to preserve valid shapes except for those using torch._inductor pattern matcher.
Should we require the other transformations to preserve it to avoid running shape propagation multiple times? cc: @lucaslie

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_moe.py

lucaslie · 2025-07-22T00:59:30Z

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_moe.py

+
+        # TODO:(hg) confirm this
+        info = TransformInfo(
+            skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=True


Suggested change

skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=True

skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=False

@Fridah-nv can also comment on this

tensorrt_llm/_torch/auto_deploy/transform/library/kvcache.py

tensorrt_llm/_torch/auto_deploy/transform/library/quantize_moe.py

tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py

tensorrt_llm/_torch/auto_deploy/transform/library/kvcache.py

lucaslie · 2025-07-22T01:07:11Z

tensorrt_llm/_torch/auto_deploy/transform/library/rope.py

+
+        # TODO:(hg) confirm this
+        info = TransformInfo(
+            skipped=False, num_matches=num_matches, is_clean=False, has_valid_shapes=True


@Fridah-nv please help confirm @h-guo18 as well

has_valid_shapes should be set to False since the pattern matcher utility won't preserve shape information correctly

Signed-off-by: haoguo <[email protected]>

h-guo18 added 3 commits July 18, 2025 19:16

refactor: move kvcache transform to new inf optimizer

7fb3054

Signed-off-by: haoguo <[email protected]>

refactor: move kvcache transform to new inf optimizer

1506898

Signed-off-by: haoguo <[email protected]>

style: function naming

396915f

Signed-off-by: haoguo <[email protected]>

h-guo18 self-assigned this Jul 19, 2025

h-guo18 requested a review from lucaslie July 19, 2025 00:02

h-guo18 changed the title ~~Haoguo/move transforms~~ [GH Issue #4403](https://github.com/NVIDIA/TensorRT-LLM/issues/4403) [refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025

h-guo18 changed the title ~~[GH Issue #4403](https://github.com/NVIDIA/TensorRT-LLM/issues/4403) [refactor] Move KVCache, Quantization to new InferenceOptimizer~~ [GH Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025

h-guo18 changed the title ~~[GH Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer~~ [Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025

merge: main branch

039178b

Signed-off-by: haoguo <[email protected]>

nv-auto-deploy deleted a comment from github-actions bot Jul 19, 2025

h-guo18 marked this pull request as ready for review July 19, 2025 00:22

h-guo18 added 2 commits July 20, 2025 05:01

refactor: migrate quant_moe to new inf optimizer

9a4afeb

Signed-off-by: haoguo <[email protected]>

refactor: migrate match_moe and fuse_moe

818ab38

Signed-off-by: haoguo <[email protected]>

h-guo18 marked this pull request as draft July 20, 2025 21:29

h-guo18 added 2 commits July 20, 2025 23:06

style: remove irrelevant code

4d0c239

Signed-off-by: haoguo <[email protected]>

refactor: move rope transform

0a4600f

Signed-off-by: haoguo <[email protected]>

h-guo18 changed the title ~~[Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer~~ [Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer Jul 21, 2025

h-guo18 added 2 commits July 21, 2025 02:21

refactor: move fuse_rms

a3e19b6

Signed-off-by: haoguo <[email protected]>

style: remove irrelevant code

1d602a9

Signed-off-by: haoguo <[email protected]>

lucaslie reviewed Jul 22, 2025

View reviewed changes

h-guo18 added 4 commits July 23, 2025 03:25

refactor: remove logging in old transformation

ffa9d97

Signed-off-by: haoguo <[email protected]>

refactor: update test_moe_fusion

4f43f30

Signed-off-by: haoguo <[email protected]>

refactor: collectives.py (no test case)

264ffd3

Signed-off-by: haoguo <[email protected]>

refactor: move eliminate_redundant_transposes

b63900d

Signed-off-by: haoguo <[email protected]>

h-guo18 closed this Jul 23, 2025

This was referenced Jul 23, 2025

Update graph test helper for unit tests on new infrencde #106

Merged

Move KV cache transforms to new inf optimizer #110

Closed

Move fuse_rmsnorm to new inf optimizer #111

Closed

Move quantization &quant_moe to new inf optimizer #112

Merged

This was referenced Jul 24, 2025

[#4403][refactor] Move eliminate_redundant_transpose and rope transforms to new inf optimizer #113

Merged

Move fuse_collectives, fuse_moe, fuse_gemm to new inf optimizer #114

Closed

Move attention transforms to new inference optimizer #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer #98

[Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer #98

Uh oh!

h-guo18 commented Jul 19, 2025 •

edited

Loading

Uh oh!

lucaslie left a comment

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Uh oh!

Fridah-nv Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Uh oh!

Fridah-nv Jul 24, 2025

Uh oh!

Uh oh!

	skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=True
	skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=False

	skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=True
	skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=False

[Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer #98

[Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer #98

Uh oh!

Conversation

h-guo18 commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GH Issue #4403 [refactor] Move pattern matching transforms to new InferenceOptimizer

Description

Test Coverage

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucaslie Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

h-guo18 commented Jul 19, 2025 •

edited

Loading