[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend #5554

xuanzic · 2025-06-27T20:43:40Z

PR title

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend based on PR #5497

Description

add per request kv cache / timing / spec dec metrics in triton using LLMAPI pytorch runtime
Usage:

1. cp -R triton_backend/all_models/llmapi/ llmapi_repo/
2. python3 triton_backend/scripts/launch_triton_server.py --model_repo=llmapi_repo/

curl -X POST localhost:8000/v2/models/tensorrt_llm/generate -d '{"text_input": "Please explain to me what is machine learning? ", "max_tokens":10, "sampling_param_return_perf_metrics":true}' | jq

Response will look like:


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   800    0   675  100   125    507     93  0:00:01  0:00:01 --:--:--   601
{
  "acceptance_rate": "0.0",
  "arrival_time_ns": "76735247746000",
  "first_scheduled_time_ns": "76735248284000",
  "first_token_time_ns": "76735374300000",
  "kv_cache_alloc_new_blocks": "1",
  "kv_cache_alloc_total_blocks": "1",
  "kv_cache_hit_rate": "0.0",
  "kv_cache_missed_block": "1",
  "kv_cache_reused_block": "0",
  "last_token_time_ns": "76736545324000",
  "model_name": "tensorrt_llm",
  "model_version": "1",
  "text_output": "Please explain to me what is machine learning? \n\nMachine learning is a field of computer science that involves the development of algorithms and models that can learn from data without being explicitly programmed. It is a",
  "total_accepted_draft_tokens": "0",
  "total_draft_tokens": "0"
}

Test Coverage

verify request_perf_metrics status in the SamplingParams

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

achartier

lgtm

achartier · 2025-06-30T15:18:04Z

/bot run --stage-list "A30-Triton-[Post-Merge]-2"

tensorrt-cicd · 2025-06-30T15:24:00Z

PR_Github #10376 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-30T19:31:15Z

PR_Github #10376 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7672 (Partly Tested) completed with status: 'SUCCESS'

achartier · 2025-06-30T20:05:48Z

/bot run

tensorrt-cicd · 2025-06-30T20:11:30Z

PR_Github #10393 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-30T22:37:29Z

PR_Github #10393 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7685 completed with status: 'SUCCESS'

achartier · 2025-07-01T02:50:54Z

/bot reuse-pipeline

tensorrt-cicd · 2025-07-01T02:55:59Z

PR_Github #10421 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-07-01T03:01:52Z

PR_Github #10421 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #10393 for commit 349b011

achartier · 2025-07-01T03:23:23Z

/bot reuse-pipeline

tensorrt-cicd · 2025-07-01T03:28:37Z

PR_Github #10427 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-07-01T03:35:22Z

PR_Github #10427 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #10393 for commit e91dd26

Signed-off-by: Vivian Chen <[email protected]>

achartier · 2025-07-01T04:15:16Z

/bot reuse-pipeline

tensorrt-cicd · 2025-07-01T04:21:12Z

PR_Github #10435 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd · 2025-07-01T04:33:37Z

PR_Github #10435 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #10393 for commit fab2f90

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

achartier approved these changes Jun 27, 2025

View reviewed changes

xuanzic mentioned this pull request Jun 27, 2025

[TRTLLM-6104] add docs on request_perf_metrics to triton LLMAPI backend triton-inference-server/tensorrtllm_backend#769

Merged

xuanzic force-pushed the triton-llmapi-perfmetrics branch 2 times, most recently from 6273131 to 00d3edb Compare July 1, 2025 02:46

achartier force-pushed the triton-llmapi-perfmetrics branch from 00d3edb to 349b011 Compare July 1, 2025 02:48

achartier enabled auto-merge (squash) July 1, 2025 02:50

auto-merge was automatically disabled July 1, 2025 02:58
Head branch was pushed to by a user without write access

xuanzic force-pushed the triton-llmapi-perfmetrics branch from 349b011 to 2a05e71 Compare July 1, 2025 02:58

achartier force-pushed the triton-llmapi-perfmetrics branch from 2a05e71 to e91dd26 Compare July 1, 2025 03:19

achartier enabled auto-merge (squash) July 1, 2025 03:19

xuanzic added 4 commits June 30, 2025 20:59

add triton torch backend using LLMAPI return perf metrics

d152519

Signed-off-by: Vivian Chen <[email protected]>

add timing and spec sec metrics

0c6fe00

Signed-off-by: Vivian Chen <[email protected]>

add timing and spec sec metrics

308fbfe

Signed-off-by: Vivian Chen <[email protected]>

fix formatting

fab2f90

Signed-off-by: Vivian Chen <[email protected]>

auto-merge was automatically disabled July 1, 2025 04:08
Head branch was pushed to by a user without write access

xuanzic force-pushed the triton-llmapi-perfmetrics branch from e91dd26 to fab2f90 Compare July 1, 2025 04:08

achartier merged commit 34212e2 into NVIDIA:main Jul 1, 2025
3 checks passed

Shunkangz pushed a commit to Shunkangz/TensorRT-LLM that referenced this pull request Jul 2, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

ab26b92

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

d9b751c

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

a7134e6

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

53cce19

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

ce72a82

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

106d2cc

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

4d7dfe6

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

bfdf2ca

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (…

af886b7

…NVIDIA#5554) Signed-off-by: Vivian Chen <[email protected]>

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend #5554

[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend #5554

Uh oh!

Conversation

xuanzic commented Jun 27, 2025

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

achartier left a comment

Choose a reason for hiding this comment

Uh oh!

achartier commented Jun 30, 2025

Uh oh!

tensorrt-cicd commented Jun 30, 2025

Uh oh!

tensorrt-cicd commented Jun 30, 2025

Uh oh!

achartier commented Jun 30, 2025

Uh oh!

tensorrt-cicd commented Jun 30, 2025

Uh oh!

tensorrt-cicd commented Jun 30, 2025

Uh oh!

achartier commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

achartier commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

achartier commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

tensorrt-cicd commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!