-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Is your feature request related to a problem? Please describe.
Trying to load the model 'granite-embedding-107m-multilingual' with the backend '[transformers cuda12-transformers cuda-12-diffusers cpu-llama-cpp cuda12-exllama2 faster-whisper diffusers cpu-whisper cuda12-stablediffusion-ggml llama-cpp rerankers vllm bark-cpp silero-vad cuda12-diffusers stablediffusion-ggml cuda12-llama-cpp bark cuda12-kokoro-development piper cuda12-chatterbox-development latest-gpu-nvidia-cuda-12-diffusers coqui kokoro tmp cuda12-faster-whisper-development exllama2 whisper cuda12-bark-development cuda12-vllm cuda12-transformers-development]
granite-embedding-107m-multilingual should be using one of the following: cuda12-exllama2, vllm, cuda12-llama-cpp but it's trying to load transformers cuda12 as the backend first.
Describe the solution you'd like
More control over what models use what back-end. This could be done inside the model card
embeddings: true
name: granite-embedding-107m-multilingual
parameters:
model: granite-embedding-107m-multilingual-f16.gguf
backends: cuda12-exllama2, vllm, cuda12-llama-cpp