-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requestedFurther information is requested
Description
System Info
System Information:
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT-LLM version:
Detailed output:
Paste the output of the above commands here
How would you like to use TensorRT-LLM
How to add the cancellation of deep thinking in the command for starting the model?
trtllm-serve /app/models/Qwen3-32B-NVFP4 --host 0.0.0.0 --port 8000
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
coderabbitai
Metadata
Metadata
Assignees
Labels
LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requestedFurther information is requested