More control over what models use what back-end

**Is your feature request related to a problem? Please describe.**
`Trying to load the model 'granite-embedding-107m-multilingual' with the backend '[transformers cuda12-transformers cuda-12-diffusers cpu-llama-cpp cuda12-exllama2 faster-whisper diffusers cpu-whisper cuda12-stablediffusion-ggml llama-cpp rerankers vllm bark-cpp silero-vad cuda12-diffusers stablediffusion-ggml cuda12-llama-cpp bark cuda12-kokoro-development piper cuda12-chatterbox-development latest-gpu-nvidia-cuda-12-diffusers coqui kokoro tmp cuda12-faster-whisper-development exllama2 whisper cuda12-bark-development cuda12-vllm cuda12-transformers-development]
`

granite-embedding-107m-multilingual should be using one of the following: cuda12-exllama2, vllm, cuda12-llama-cpp but it's trying to load transformers cuda12 as the backend first.

**Describe the solution you'd like**
More control over what models use what back-end. This could be done inside the model card

```
embeddings: true
name: granite-embedding-107m-multilingual
parameters:
  model: granite-embedding-107m-multilingual-f16.gguf
backends: cuda12-exllama2, vllm, cuda12-llama-cpp
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

More control over what models use what back-end #6081

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

More control over what models use what back-end #6081

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions