Skip to content

[Bug]: reasoning_parser does not populate reasoning_content in LLM API (v1.2.0rc5) #10143

@coga-ash

Description

@coga-ash

System Info

CPU : AMD 9950x
GPU : RTX PRO 6000
Libraries:
TensorRT-LLM: 1.2.0rc5
Backend: PyTorch backend
OS: Ubuntu 24.04

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Case A: Built-in deepseek-r1 parser with DeepSeek-R1-Distill

from tensorrt_llm import LLM, SamplingParams

llm = LLM(
    model="DeepSeek-R1-Distill-Qwen-7B",
    reasoning_parser="deepseek-r1"
)

prompt = "<|begin_of_sentence|><|User|>안녕 너는 누구야?<|Assistant|><think>\n"
sampling_params = SamplingParams(max_tokens=256)

res = llm.generate(prompt, sampling_params)
output_obj = res.outputs[0]

print(f"Reasoning: {getattr(output_obj, 'reasoning_content', 'N/A')}")
print(f"Text: {output_obj.text}")

Case B: Custom parser with GPT-OSS-20B

# Custom Parser Registration
class GPTOSSParser(BaseReasoningParser):
    def __init__(self) -> None:
        self.reasoning_start = "analysis"
        self.answer_start = "assistantfinal"
    # ... (parse and parse_delta implementation)

ReasoningParserFactory.parsers["gpt-oss"] = GPTOSSParser

llm = LLM(model="gpt-oss-20b", reasoning_parser="gpt-oss")
# ... (generation code)

Expected behavior

The reasoning_content field in the CompletionOutput (or RequestOutput.outputs[0]) should be automatically populated with the text generated between the specified reasoning markers (e.g., and for DeepSeek, or analysis and assistantfinal for GPT-OSS).

The text field should contain only the final answer, excluding the reasoning/thought process.

For streaming or batch generation, the post-processing pipeline should trigger the parse or parse_delta methods of the assigned ReasoningParser to ensure structured data delivery.

actual behavior

The reasoning_content field is either missing from the output object or consistently returns an empty/N/A value, even when the model output clearly contains the specified markers.

All generated content, including the raw reasoning markers and the internal thought process, is returned as a single concatenated string in the text field.

Observation: In the CompletionOutput log, the _postprocess_result attribute is shown as None, suggesting that the reasoning parser logic is not being invoked during the post-generation phase.

additional notes

This issue is consistently observed when running TensorRT-LLM with the PyTorch backend (Using LLM with PyTorch backend).

The failure occurs across both:

Built-in parsers (e.g., reasoning_parser="deepseek-r1") with DeepSeek-R1-Distill models.

Custom parsers registered via ReasoningParserFactory.parsers for Harmony-formatted models like GPT-OSS.

In v1.2.0rc5, llm.generate() returns a RequestOutput object instead of a list, but accessing res.outputs[0] still shows that no field separation has occurred.

Registering the parser directly to the LLM instance or through the factory does not resolve the issue, indicating a potential break in the internal post-processing hook in this specific release.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.Pytorch<NV>Pytorch backend related issuesbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions