Skip to content

Conversation

h-guo18
Copy link

@h-guo18 h-guo18 commented Jul 19, 2025

GH Issue #4403 [refactor] Move pattern matching transforms to new InferenceOptimizer

Description

  • Moved the following transformation into the new configurable inference optimizer:
    • quantize
    • moe
    • KVCache
    • ROPE
  • Updated unit test of the corresponding transforms to use the new inference optimizer.

Test Coverage

Unite tests. See changed files.

@h-guo18 h-guo18 self-assigned this Jul 19, 2025
@h-guo18 h-guo18 requested a review from lucaslie July 19, 2025 00:02
@h-guo18 h-guo18 changed the title Haoguo/move transforms [GH Issue #4403](https://github.com/NVIDIA/TensorRT-LLM/issues/4403) [refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025
@h-guo18 h-guo18 changed the title [GH Issue #4403](https://github.com/NVIDIA/TensorRT-LLM/issues/4403) [refactor] Move KVCache, Quantization to new InferenceOptimizer [GH Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025
@h-guo18 h-guo18 changed the title [GH Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer [Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer Jul 19, 2025
Signed-off-by: haoguo <[email protected]>
@nv-auto-deploy nv-auto-deploy deleted a comment from github-actions bot Jul 19, 2025
@nv-auto-deploy nv-auto-deploy deleted a comment from github-actions bot Jul 19, 2025
@h-guo18 h-guo18 marked this pull request as ready for review July 19, 2025 00:22
@h-guo18 h-guo18 marked this pull request as draft July 20, 2025 21:29
@h-guo18 h-guo18 changed the title [Issue #4403][refactor] Move KVCache, Quantization to new InferenceOptimizer [Issue #4403][refactor] Move pattern matching transforms to new InferenceOptimizer Jul 21, 2025
Copy link
Collaborator

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any transform that we move we should:

  1. remove the corresponding transform from the old InferenceOptimizer
  2. configure the default settings in auto_deploy/config/default.yaml

Once we have finalized the reviews, would you be comfortable submitting one PR per transform? (or at least one PR per a couple of transforms). This well help with tracking potential regressions in case we face any


# TODO:(hg) confirm this
info = TransformInfo(
skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=True
skipped=False, num_matches=num_moe_patterns, is_clean=False, has_valid_shapes=False

This is safer unless we know that the transform correctly assigns and updates shapes.

@Fridah-nv can also comment on that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on this. I think all the transformations should be able to preserve valid shapes except for those using torch._inductor pattern matcher.
Should we require the other transformations to preserve it to avoid running shape propagation multiple times? cc: @lucaslie


# TODO:(hg) confirm this
info = TransformInfo(
skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=True
skipped=False, num_matches=fused_key_counter, is_clean=False, has_valid_shapes=False

@Fridah-nv can also comment on this


# TODO:(hg) confirm this
info = TransformInfo(
skipped=False, num_matches=num_matches, is_clean=False, has_valid_shapes=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fridah-nv please help confirm @h-guo18 as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_valid_shapes should be set to False since the pattern matcher utility won't preserve shape information correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants