-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
CPU : AMD 9950x
GPU : RTX PRO 6000
Libraries:
TensorRT-LLM: 1.2.0rc5
Backend: PyTorch backend
OS: Ubuntu 24.04
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Case A: Built-in deepseek-r1 parser with DeepSeek-R1-Distill
from tensorrt_llm import LLM, SamplingParams
llm = LLM(
model="DeepSeek-R1-Distill-Qwen-7B",
reasoning_parser="deepseek-r1"
)
prompt = "<|begin_of_sentence|><|User|>안녕 너는 누구야?<|Assistant|><think>\n"
sampling_params = SamplingParams(max_tokens=256)
res = llm.generate(prompt, sampling_params)
output_obj = res.outputs[0]
print(f"Reasoning: {getattr(output_obj, 'reasoning_content', 'N/A')}")
print(f"Text: {output_obj.text}")Case B: Custom parser with GPT-OSS-20B
# Custom Parser Registration
class GPTOSSParser(BaseReasoningParser):
def __init__(self) -> None:
self.reasoning_start = "analysis"
self.answer_start = "assistantfinal"
# ... (parse and parse_delta implementation)
ReasoningParserFactory.parsers["gpt-oss"] = GPTOSSParser
llm = LLM(model="gpt-oss-20b", reasoning_parser="gpt-oss")
# ... (generation code)Expected behavior
The reasoning_content field in the CompletionOutput (or RequestOutput.outputs[0]) should be automatically populated with the text generated between the specified reasoning markers (e.g., and for DeepSeek, or analysis and assistantfinal for GPT-OSS).
The text field should contain only the final answer, excluding the reasoning/thought process.
For streaming or batch generation, the post-processing pipeline should trigger the parse or parse_delta methods of the assigned ReasoningParser to ensure structured data delivery.
actual behavior
The reasoning_content field is either missing from the output object or consistently returns an empty/N/A value, even when the model output clearly contains the specified markers.
All generated content, including the raw reasoning markers and the internal thought process, is returned as a single concatenated string in the text field.
Observation: In the CompletionOutput log, the _postprocess_result attribute is shown as None, suggesting that the reasoning parser logic is not being invoked during the post-generation phase.
additional notes
This issue is consistently observed when running TensorRT-LLM with the PyTorch backend (Using LLM with PyTorch backend).
The failure occurs across both:
Built-in parsers (e.g., reasoning_parser="deepseek-r1") with DeepSeek-R1-Distill models.
Custom parsers registered via ReasoningParserFactory.parsers for Harmony-formatted models like GPT-OSS.
In v1.2.0rc5, llm.generate() returns a RequestOutput object instead of a list, but accessing res.outputs[0] still shows that no field separation has occurred.
Registering the parser directly to the LLM instance or through the factory does not resolve the issue, indicating a potential break in the internal post-processing hook in this specific release.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.