Feat/support lora cuda graph #7335

shaharmor98 · 2025-08-28T08:43:35Z

Summary by CodeRabbit

New Features
- Added optional LoRA adapter support with prefetching and cache integration.
- Enabled LoRA-aware CUDA Graph warmup and decoding for better performance.
- Engine now initializes LoRA resources automatically and provides an accessor to manage adapters.
- Supports injecting LoRA parameters into graph-captured executions when available; behavior unchanged if not used.
Tests
- Added test ensuring LoRA loading from directory works with CUDA Graph configuration during generation.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Shahar Mor <[email protected]>

coderabbitai · 2025-08-28T08:43:44Z

Walkthrough

Adds optional LoRA integration across PyTorch executor: wires a LoraManager with a PEFT cache, supports prefetching LoRA adapters, propagates lora_params into CUDA graph capture/replay, updates resource/dummy request handling to carry LoRA fields, exposes get_lora_manager(), and adds a LoRA+CUDA graph unit test.

Changes

Cohort / File(s)	Summary of changes
CUDA graph runner LoRA plumbing `tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py`	Add optional lora_params to DecodingCUDAGraphRunner init; inject lora_params into forward inputs during capture when provided.
Model engine LoRA integration `tensorrt_llm/_torch/pyexecutor/model_engine.py`	Introduce LoraManager lifecycle, PEFT cache hookup, LoRA prefetch API, warmup with LoRA context, pass lora_params into CUDA graph runner, and extend _maybe_get_cuda_graph to accept resource_manager.
Executor wiring and accessor `tensorrt_llm/_torch/pyexecutor/py_executor.py`	On init, connect LoraManager to CPP PEFT cache and prefetch LoRA dirs; add get_lora_manager() accessor.
Resource management and requests `tensorrt_llm/_torch/pyexecutor/resource_manager.py`, `tensorrt_llm/_torch/pyexecutor/llm_request.py`	KVCacheManager.add_dummy_requests accepts lora_request; propagate lora_task_id, lora_weights, lora_config into LlmRequest constructor (constructor extended accordingly).
LoRA manager enhancements `tensorrt_llm/lora_manager.py`	LoraManager accepts/sets external CPP PeftCacheManager; LoraConfig gains lora_request field; is_adapter_in_cpu_cache consults cache manager when available.
Worker uses public accessor `tensorrt_llm/executor/worker.py`	Replace direct CPP resource retrieval with self.engine.get_lora_manager() in PyTorch backend path.
Tests: LoRA with CUDA graphs `tests/unittest/llmapi/test_llm_pytorch.py`	Add CudaGraphConfig usage and test covering LoRA dir loading with CUDA graph generation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant PyExecutor
  participant ModelEngine
  participant ResourceManager
  participant LoraManager
  participant PeftCacheMgr as PEFT Cache Manager (CPP)

  User->>PyExecutor: init(...)
  PyExecutor->>ModelEngine: construct(...)
  PyExecutor->>ModelEngine: set_lora_manager_cpp_peft_cache_manager(ResourceManager)
  ModelEngine->>ResourceManager: get(ResourceManagerType.PEFT_CACHE_MANAGER)
  ResourceManager-->>ModelEngine: PEFT cache mgr
  ModelEngine->>LoraManager: set_cpp_peft_cache_manager(PeftCacheMgr)
  PyExecutor->>ModelEngine: prefetch_lora_dirs()
  ModelEngine->>LoraManager: load adapters / prefetch
  LoraManager-->>ModelEngine: adapters ready
  ModelEngine-->>PyExecutor: has_lora_prefetched = True

sequenceDiagram
  autonumber
  participant Scheduler as Request Scheduler
  participant ModelEngine
  participant ResourceManager
  participant CudaGraph as DecodingCUDAGraphRunner
  participant LoraManager

  Scheduler->>ModelEngine: forward(batch, resource_manager)
  ModelEngine->>ModelEngine: _maybe_get_cuda_graph(..., resource_manager)
  alt LoRA prefetched
    ModelEngine->>LoraManager: build lora_config / params
    LoraManager-->>ModelEngine: lora_params
    ModelEngine->>CudaGraph: construct(..., lora_params)
    CudaGraph->>CudaGraph: capture(forward_fn, inputs + lora_params)
  else No LoRA
    ModelEngine->>CudaGraph: construct(..., lora_params=None)
    CudaGraph->>CudaGraph: capture(forward_fn, inputs)
  end
  ModelEngine->>CudaGraph: replay(...)
  CudaGraph-->>ModelEngine: outputs
  ModelEngine-->>Scheduler: outputs

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests #7203 — Both modify construction/consumption of lora_params for LoRA-enabled execution paths.
[TRTLLM-6825][fix] Update lora for phi4-mm #6817 — Overlaps in LoRA manager and ModelEngine LoRA wiring and signatures.
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically #6363 — Touches CUDA graph creation/caching paths similar to _maybe_get_cuda_graph changes.

Suggested reviewers

Naveassaf
amitz-nv
venkywonka
yuxianq
byshiue

Poem

Hop, hop, I cache and prefetch, whee!
LoRA’s packed snug in my PEFT tree.
Graphs hum softly, kernels align,
lora_params in tow—how fine!
With tiny paws I wire the run,
CUDA and LoRA, racing as one. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tensorrt_llm/lora_manager.py (1)

442-444: Constructor ignores cpp_peft_cache_manager argument — set the field.

Currently the passed manager is dropped and a new None field is created later. Initialize it in init and drop the redundant defaulting.

 class LoraManager(object):
@@
-    def __init__(
-        self, cpp_peft_cache_manager: tb_internal.batch_manager.PeftCacheManager | None = None
-    ):
+    def __init__(
+        self, cpp_peft_cache_manager: tb_internal.batch_manager.PeftCacheManager | None = None
+    ):
@@
-        self._lora_uid_counter = 0
+        self._lora_uid_counter = 0
@@
-        self.lora_target_modules: List[str] = []
-        self._cpp_peft_cache_manager: Optional[tb_internal.batch_manager.PeftCacheManager] = None
+        self.lora_target_modules: List[str] = []
+        self._cpp_peft_cache_manager: Optional[
+            tb_internal.batch_manager.PeftCacheManager
+        ] = cpp_peft_cache_manager

Also applies to: 487-493

tensorrt_llm/_torch/pyexecutor/model_engine.py (1)

1978-1979: DoRA detection logic reintroduced; it was intentionally removed.

Per prior removal, set is_dora to False to avoid inverted detection.
-                is_dora = module.scaling_vec_pointer == 0
+                is_dora = False  # DoRA disabled in PyTorch flow

🧹 Nitpick comments (8)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
322-324: Add return type and minimal docstring for public API get_lora_manager().

Improves clarity and external usage.
-    def get_lora_manager(self):
-        return self.model_engine.lora_manager
+    def get_lora_manager(self) -> Optional["LoraManager"]:
+        """Return the LoRA manager associated with this executor (PyTorch backend only)."""
+        return self.model_engine.lora_manager
tensorrt_llm/executor/worker.py (1)
162-168: Guard against missing LoRA manager in PyTorch path.

If engine.get_lora_manager() unexpectedly returns None, later access will fail. Add an assert with a clear error.
-            self._lora_manager = self.engine.get_lora_manager()
+            self._lora_manager = self.engine.get_lora_manager()
+            assert self._lora_manager is not None, (
+                "LoRA config provided but no LoraManager available from engine."
+            )
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (3)
30-38: Document and type lora_params; ensure it’s capture-safe.

Clarify expected structure/device and narrow typing for safer usage during capture/replay.
-        use_mrope: bool = False,
-        lora_params: Optional[dict] = None,
+        use_mrope: bool = False,
+        lora_params: Optional[Dict[str, torch.Tensor]] = None,
Add to the constructor docstring (not shown) that:

lora_params tensors must be on the capture device,

shapes and storage addresses must remain constant across replays (contents may mutate).

72-72: Persist lora_params and mark as optional model input.

Include lora_params in optional_extra_model_inputs to mirror mrope handling and avoid accidental shape checks elsewhere that rely on this list.
-        self.lora_params = lora_params
-        self._output = None
+        self.lora_params = lora_params
+        self._output = None
         self._graph = None
-        self.optional_extra_model_inputs = ["mrope_position_deltas"]
+        self.optional_extra_model_inputs = ["mrope_position_deltas", "lora_params"]
95-97: Inject lora_params during capture — OK; add minimal validation.

Pre-capture, assert tensors live on the target device to catch misconfigurations early.
         if self.lora_params is not None:
+            # lightweight validation
+            for k, v in self.lora_params.items():
+                assert isinstance(v, torch.Tensor) and v.device.type == "cuda", \
+                    f"lora_params['{k}'] must be a CUDA tensor"
             inputs["lora_params"] = self.lora_params
tensorrt_llm/_torch/pyexecutor/model_engine.py (3)
535-537: Comment cleanup and consistency.

Remove temporary “SMOR” comments; they will leak into production.
-                    lora_request=
-                    lora_config,  # TODO smor- tests assume BS1 then this will be ignored for now, need to resolve
+                    lora_request=lora_binding,
...
-                    lora_request=lora_config,
+                    lora_request=lora_binding,
Also applies to: 550-551

1001-1002: Replace print with logger.
-            print(f"SMOR, not failed on lora_params in maybe_get_cuda_graph")
+            logger.debug("LoRA params prepared for CUDA graph.")
1-1: Missing NVIDIA copyright header.

Add the standard NVIDIA header (current year) per guidelines.

Please ensure the repo’s standard header is applied uniformly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4541655 and 3c9ac4b.

📒 Files selected for processing (7)

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (3 hunks)
tensorrt_llm/_torch/pyexecutor/model_engine.py (9 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/resource_manager.py (2 hunks)
tensorrt_llm/executor/worker.py (1 hunks)
tensorrt_llm/lora_manager.py (3 hunks)
tests/unittest/llmapi/test_llm_pytorch.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs
Preserve module namespaces in imports: import the subpackage/module, not the symbol (from package.subpackage import foo; foo.SomeClass())
Naming: files snake_case; classes PascalCase; functions/methods snake_case; local variables snake_case (k_ prefix if starting with a number); globals G_ + UPPER_SNAKE_CASE; constants UPPER_SNAKE_CASE
Avoid shadowing outer-scope variables; initialize all externally visible members in init
Prefer docstrings for interfaces used outside a file; reserve comments for function-internal or file-local interfaces
Use Google-style docstrings for classes and functions; inline docstrings for attributes/variables are allowed
Avoid reflection when straightforward code suffices (e.g., prefer explicit parameters over dict(**locals()))
Use narrow except clauses (e.g., catch FileNotFoundError instead of bare except)
For duck-typing try/except, keep try body minimal and use else for the main logic

Files:

tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/lora_manager.py
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
tests/unittest/llmapi/test_llm_pytorch.py
tensorrt_llm/_torch/pyexecutor/resource_manager.py
tensorrt_llm/_torch/pyexecutor/model_engine.py

**/*.{cpp,cc,cxx,cu,h,hpp,hh,hxx,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header with current year to all source files

Files:

tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/lora_manager.py
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
tests/unittest/llmapi/test_llm_pytorch.py
tensorrt_llm/_torch/pyexecutor/resource_manager.py
tensorrt_llm/_torch/pyexecutor/model_engine.py

🧠 Learnings (6)

📚 Learning: 2025-08-26T06:07:02.166Z

Learnt from: shaharmor98
PR: NVIDIA/TensorRT-LLM#7231
File: tensorrt_llm/_torch/pyexecutor/_util.py:504-509
Timestamp: 2025-08-26T06:07:02.166Z
Learning: In tensorrt_llm/_torch/pyexecutor/_util.py, when calling model_engine.set_lora_model_config(), pass model_binding_config.mlp_hidden_size directly without multiplying by mapping.tp_size, as the mlp_hidden_size from get_bindings_model_config() is already the per-TP rank value needed for LoRA weight packaging.

Applied to files:

tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/model_engine.py

📚 Learning: 2025-07-17T09:01:27.402Z

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/resource_manager.py
tensorrt_llm/_torch/pyexecutor/model_engine.py

📚 Learning: 2025-08-19T12:45:11.997Z

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#7033
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:0-0
Timestamp: 2025-08-19T12:45:11.997Z
Learning: In tensorrt_llm/_torch/pyexecutor/model_engine.py, DoRA (Delta Orthogonal Rank Adaptation) functionality was removed from the PyTorch flow to eliminate issues with inverted DoRA detection logic. The original is_dora condition was checking if scaling_vec_pointer == 0, which was potentially incorrect.

Applied to files:

tensorrt_llm/executor/worker.py
tensorrt_llm/_torch/pyexecutor/model_engine.py

📚 Learning: 2025-08-26T09:37:10.463Z

Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.

Applied to files:

tests/unittest/llmapi/test_llm_pytorch.py

📚 Learning: 2025-08-26T09:37:10.463Z

Learnt from: jiaganc
PR: NVIDIA/TensorRT-LLM#7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which can contain default `cuda_graph_config` values, so `llm_args` may already have this config before the extra options processing.

Applied to files:

tests/unittest/llmapi/test_llm_pytorch.py

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/unittest/llmapi/test_llm_pytorch.py

🧬 Code graph analysis (5)

tensorrt_llm/executor/worker.py (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

get_lora_manager (322-323)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

tensorrt_llm/_torch/pyexecutor/model_engine.py (2)

set_lora_manager_cpp_peft_cache_manager (439-445)

prefetch_lora_dirs (447-458)

tensorrt_llm/lora_manager.py (3)

tensorrt_llm/_torch/models/modeling_phi4mm.py (1)

lora_request (265-286)

tensorrt_llm/_torch/pyexecutor/resource_manager.py (1)

PeftCacheManager (1158-1239)

cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp (1)

PeftCacheManager (231-255)

tests/unittest/llmapi/test_llm_pytorch.py (4)

tensorrt_llm/llmapi/llm_args.py (1)

CudaGraphConfig (63-88)

tensorrt_llm/executor/request.py (1)

LoRARequest (23-43)

tensorrt_llm/lora_manager.py (1)

LoraConfig (141-158)

tests/unittest/utils/util.py (1)

similar (369-371)

tensorrt_llm/_torch/pyexecutor/resource_manager.py (3)

tensorrt_llm/_torch/models/modeling_phi4mm.py (2)

lora_request (265-286)

lora_config (242-262)

tensorrt_llm/lora_manager.py (1)

lora_weights (917-918)

tensorrt_llm/_torch/pyexecutor/llm_request.py (1)

LlmRequest (264-351)

🪛 Ruff (0.12.2)