Skip to content

[Usage]: How to turn off the deep thinking mode of the model #10103

@m08594589-source

Description

@m08594589-source

System Info

System Information:

  • OS:
  • Python version:
  • CUDA version:
  • GPU model(s):
  • Driver version:
  • TensorRT-LLM version:

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM

How to add the cancellation of deep thinking in the command for starting the model?
trtllm-serve /app/models/Qwen3-32B-NVFP4 --host 0.0.0.0 --port 8000

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions