You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Usage]: The multimodal feature in TensorRT-LLM is just a demo, and I see it only supports Qwen2-VL. If I want to deploy Qwen2.5-VL or Qwen3-VL, do I need to develop it myself? Will the official team not adapt it for new models going forward? #10069
I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.
Specific questions:
Model:
Use case (e.g., chatbot, batch inference, real-time serving):
Expected throughput/latency requirements:
Multi-GPU setup needed:
Before submitting a new issue...
Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.