Skip to content

More control over what models use what back-end #6081

@Hello-World-Traveler

Description

@Hello-World-Traveler

Is your feature request related to a problem? Please describe.
Trying to load the model 'granite-embedding-107m-multilingual' with the backend '[transformers cuda12-transformers cuda-12-diffusers cpu-llama-cpp cuda12-exllama2 faster-whisper diffusers cpu-whisper cuda12-stablediffusion-ggml llama-cpp rerankers vllm bark-cpp silero-vad cuda12-diffusers stablediffusion-ggml cuda12-llama-cpp bark cuda12-kokoro-development piper cuda12-chatterbox-development latest-gpu-nvidia-cuda-12-diffusers coqui kokoro tmp cuda12-faster-whisper-development exllama2 whisper cuda12-bark-development cuda12-vllm cuda12-transformers-development]

granite-embedding-107m-multilingual should be using one of the following: cuda12-exllama2, vllm, cuda12-llama-cpp but it's trying to load transformers cuda12 as the backend first.

Describe the solution you'd like
More control over what models use what back-end. This could be done inside the model card

embeddings: true
name: granite-embedding-107m-multilingual
parameters:
  model: granite-embedding-107m-multilingual-f16.gguf
backends: cuda12-exllama2, vllm, cuda12-llama-cpp

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions