Skip to content

[Usage]: The multimodal feature in TensorRT-LLM is just a demo, and I see it only supports Qwen2-VL. If I want to deploy Qwen2.5-VL or Qwen3-VL, do I need to develop it myself? Will the official team not adapt it for new models going forward? #10069

@ztzywm

Description

@ztzywm

System Info

System Information:

  • OS:
  • Python version:
  • CUDA version:
  • GPU model(s):
  • Driver version:
  • TensorRT-LLM version:

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM

I want to run inference of a [specific model](put Hugging Face link here). I don't know how to integrate it with TensorRT-LLM or optimize it for my use case.

Specific questions:

  • Model:
  • Use case (e.g., chatbot, batch inference, real-time serving):
  • Expected throughput/latency requirements:
  • Multi-GPU setup needed:

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    MultimodalLabel for issues & PRs regarding Multimodal related objectsquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions