Skip to content

Commit e19d722

Browse files
authored
feat: more embedded models, coqui fixes, add model usage and description (#1556)
* feat: add model descriptions and usage * remove default model gallery * models: add embeddings and tts * docs: update table * docs: updates * images: cleanup pip cache after install * images: always run apt-get clean * ux: improve gRPC connection errors * ux: improve some messages * fix: fix coqui when no AudioPath is passed by * embedded: add more models * Add usage * Reorder table
1 parent 0843fe6 commit e19d722

File tree

21 files changed

+216
-45
lines changed

21 files changed

+216
-45
lines changed

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ ENV BUILD_TYPE=${BUILD_TYPE}
1515

1616
ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,petals:/build/backend/python/petals/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,exllama:/build/backend/python/exllama/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh"
1717

18-
ENV GALLERIES='[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]'
1918
ARG GO_TAGS="stablediffusion tinydream tts"
2019

2120
RUN apt-get update && \
@@ -64,12 +63,12 @@ RUN curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmo
6463
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list && \
6564
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list && \
6665
apt-get update && \
67-
apt-get install -y conda
66+
apt-get install -y conda && apt-get clean
6867

6968
ENV PATH="/root/.cargo/bin:${PATH}"
7069
RUN pip install --upgrade pip
7170
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
72-
RUN apt-get install -y espeak-ng espeak
71+
RUN apt-get install -y espeak-ng espeak && apt-get clean
7372

7473
###################################
7574
###################################
@@ -127,10 +126,11 @@ ARG CUDA_MAJOR_VERSION=11
127126
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
128127
ENV NVIDIA_REQUIRE_CUDA="cuda>=${CUDA_MAJOR_VERSION}.0"
129128
ENV NVIDIA_VISIBLE_DEVICES=all
129+
ENV PIP_CACHE_PURGE=true
130130

131131
# Add FFmpeg
132132
RUN if [ "${FFMPEG}" = "true" ]; then \
133-
apt-get install -y ffmpeg \
133+
apt-get install -y ffmpeg && apt-get clean \
134134
; fi
135135

136136
WORKDIR /build

api/config/config.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,9 @@ type Config struct {
5555
CUDA bool `yaml:"cuda"`
5656

5757
DownloadFiles []File `yaml:"download_files"`
58+
59+
Description string `yaml:"description"`
60+
Usage string `yaml:"usage"`
5861
}
5962

6063
type File struct {
@@ -326,6 +329,15 @@ func (cm *ConfigLoader) Preload(modelPath string) error {
326329
c.PredictionOptions.Model = md5Name
327330
cm.configs[i] = *c
328331
}
332+
if cm.configs[i].Name != "" {
333+
log.Info().Msgf("Model name: %s", cm.configs[i].Name)
334+
}
335+
if cm.configs[i].Description != "" {
336+
log.Info().Msgf("Model description: %s", cm.configs[i].Description)
337+
}
338+
if cm.configs[i].Usage != "" {
339+
log.Info().Msgf("Model usage: \n%s", cm.configs[i].Usage)
340+
}
329341
}
330342
return nil
331343
}

backend/python/common-env/transformers/install.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,12 @@ if conda_env_exists "transformers" ; then
1313
else
1414
echo "Virtual environment already exists."
1515
fi
16+
17+
if [ "$PIP_CACHE_PURGE" = true ] ; then
18+
export PATH=$PATH:/opt/conda/bin
19+
20+
# Activate conda environment
21+
source activate transformers
22+
23+
pip cache purge
24+
fi

backend/python/coqui/coqui_server.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
# If MAX_WORKERS are specified in the environment use it, otherwise default to 1
2323
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
24-
COQUI_LANGUAGE = os.environ.get('COQUI_LANGUAGE', 'en')
24+
COQUI_LANGUAGE = os.environ.get('COQUI_LANGUAGE', None)
2525

2626
# Implement the BackendServicer class with the service methods
2727
class BackendServicer(backend_pb2_grpc.BackendServicer):
@@ -38,6 +38,7 @@ def LoadModel(self, request, context):
3838
if not torch.cuda.is_available() and request.CUDA:
3939
return backend_pb2.Result(success=False, message="CUDA is not available")
4040

41+
self.AudioPath = None
4142
# List available 🐸TTS models
4243
print(TTS().list_models())
4344
if os.path.isabs(request.AudioPath):

backend/python/exllama/install.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,8 @@ echo $CONDA_PREFIX
1212

1313
git clone https://github.com/turboderp/exllama $CONDA_PREFIX/exllama && pushd $CONDA_PREFIX/exllama && pip install -r requirements.txt && popd
1414

15-
cp -rfv $CONDA_PREFIX/exllama/* ./
15+
cp -rfv $CONDA_PREFIX/exllama/* ./
16+
17+
if [ "$PIP_CACHE_PURGE" = true ] ; then
18+
pip cache purge
19+
fi

backend/python/exllama2/install.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,8 @@ echo $CONDA_PREFIX
1111

1212
git clone https://github.com/turboderp/exllamav2 $CONDA_PREFIX/exllamav2 && pushd $CONDA_PREFIX/exllamav2 && pip install -r requirements.txt && popd
1313

14-
cp -rfv $CONDA_PREFIX/exllamav2/* ./
14+
cp -rfv $CONDA_PREFIX/exllamav2/* ./
15+
16+
if [ "$PIP_CACHE_PURGE" = true ] ; then
17+
pip cache purge
18+
fi

backend/python/vall-e-x/install.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,8 @@ echo $CONDA_PREFIX
1212

1313
git clone https://github.com/Plachtaa/VALL-E-X.git $CONDA_PREFIX/vall-e-x && pushd $CONDA_PREFIX/vall-e-x && git checkout -b build $SHA && pip install -r requirements.txt && popd
1414

15-
cp -rfv $CONDA_PREFIX/vall-e-x/* ./
15+
cp -rfv $CONDA_PREFIX/vall-e-x/* ./
16+
17+
if [ "$PIP_CACHE_PURGE" = true ] ; then
18+
pip cache purge
19+
fi

docs/content/getting_started/_index.en.md

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -143,39 +143,60 @@ Note: this feature currently is available only on master builds.
143143
You can run `local-ai` directly with a model name, and it will download the model and start the API with the model loaded.
144144

145145
> Don't need GPU acceleration? use the CPU images which are lighter and do not have Nvidia dependencies
146+
> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
147+
146148

147149
{{< tabs >}}
148150
{{% tab name="CPU-only" %}}
149151

150-
| Model | Docker command |
151-
| --- | --- |
152-
| phi2 | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2``` |
153-
| llava | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava``` |
154-
| mistral-openorca | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca``` |
155-
152+
| Model | Category | Docker command |
153+
| --- | --- | --- |
154+
| [phi-2](https://huggingface.co/microsoft/phi-2) | LLM | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core phi-2``` |
155+
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | Multimodal LLM | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core llava``` |
156+
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | LLM | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core mistral-openorca``` |
157+
| [bert-cpp](https://github.com/skeskinen/bert.cpp) | Embeddings | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core bert-cpp``` |
158+
| all-minilm-l6-v2 | Embeddings | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg all-minilm-l6-v2``` |
159+
| whisper-base | Audio to Text | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core whisper-base``` |
160+
| rhasspy-voice-en-us-amy | Text to Audio | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core rhasspy-voice-en-us-amy``` |
161+
| coqui | Text to Audio | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg coqui``` |
162+
| bark | Text to Audio | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg bark``` |
163+
| vall-e-x | Text to Audio | ```docker run -ti -p 8080:8080 localai/localai:{{< version >}}-ffmpeg vall-e-x``` |
156164

157165
{{% /tab %}}
158166
{{% tab name="GPU (CUDA 11)" %}}
159167

160-
> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
161168

162-
| Model | Docker command |
163-
| --- | --- |
164-
| phi-2 | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2``` |
165-
| llava | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core llava``` |
166-
| mistral-openorca | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca``` |
169+
170+
| Model | Category | Docker command |
171+
| --- | --- | --- |
172+
| [phi-2](https://huggingface.co/microsoft/phi-2) | LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core phi-2``` |
173+
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | Multimodal LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core llava``` |
174+
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core mistral-openorca``` |
175+
| [bert-cpp](https://github.com/skeskinen/bert.cpp) | Embeddings | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core bert-cpp``` |
176+
| [all-minilm-l6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | Embeddings | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 all-minilm-l6-v2``` |
177+
| whisper-base | Audio to Text | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core whisper-base``` |
178+
| rhasspy-voice-en-us-amy | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11-core rhasspy-voice-en-us-amy``` |
179+
| coqui | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 coqui``` |
180+
| bark | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 bark``` |
181+
| vall-e-x | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda11 vall-e-x``` |
167182

168183
{{% /tab %}}
169184

170-
{{% tab name="GPU (CUDA 12)" %}}
171185

172-
> To know which version of CUDA do you have available, you can check with `nvidia-smi` or `nvcc --version`
186+
{{% tab name="GPU (CUDA 12)" %}}
173187

174-
| Model | Docker command |
175-
| --- | --- |
176-
| phi-2 | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2``` |
177-
| llava | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core llava``` |
178-
| mistral-openorca | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca``` |
188+
| Model | Category | Docker command |
189+
| --- | --- | --- |
190+
| [phi-2](https://huggingface.co/microsoft/phi-2) | LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core phi-2``` |
191+
| [llava](https://github.com/SkunkworksAI/BakLLaVA) | Multimodal LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core llava``` |
192+
| [mistral-openorca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca) | LLM | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core mistral-openorca``` |
193+
| bert-cpp | Embeddings | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core bert-cpp``` |
194+
| all-minilm-l6-v2 | Embeddings | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 all-minilm-l6-v2``` |
195+
| whisper-base | Audio to Text | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core whisper-base``` |
196+
| rhasspy-voice-en-us-amy | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12-core rhasspy-voice-en-us-amy``` |
197+
| coqui | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 coqui``` |
198+
| bark | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 bark``` |
199+
| vall-e-x | Text to Audio | ```docker run -ti -p 8080:8080 --gpus all localai/localai:{{< version >}}-cublas-cuda12 vall-e-x``` |
179200

180201
{{% /tab %}}
181202

@@ -201,7 +222,7 @@ For example, to start localai with phi-2, it's possible for instance to also use
201222
docker run -p 8080:8080 localai/localai:{{< version >}}-ffmpeg-core https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
202223
```
203224

204-
The file should be a valid YAML configuration file, for the full syntax see [advanced]({{%relref "advanced" %}}).
225+
The file should be a valid LocalAI YAML configuration file, for the full syntax see [advanced]({{%relref "advanced" %}}).
205226
{{% /notice %}}
206227

207228
### Container images

docs/content/model-compatibility/_index.en.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,18 @@ Besides llama based models, LocalAI is compatible also with other architectures.
4343
| [langchain-huggingface](https://github.com/tmc/langchaingo) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
4444
| [piper](https://github.com/rhasspy/piper) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | N/A |
4545
| [falcon](https://github.com/cmp-nct/ggllm.cpp/tree/c12b2d65f732a0d8846db2244e070f0f3e73505c) ([binding](https://github.com/mudler/go-ggllm.cpp)) | Falcon *** | yes | GPT | no | yes | CUDA |
46-
| `huggingface-embeddings` [sentence-transformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
46+
| [sentencetransformers](https://github.com/UKPLab/sentence-transformers) | BERT | no | Embeddings only | yes | no | N/A |
4747
| `bark` | bark | no | Audio generation | no | no | yes |
48-
| `AutoGPTQ` | GPTQ | yes | GPT | yes | no | N/A |
48+
| `autogptq` | GPTQ | yes | GPT | yes | no | N/A |
4949
| `exllama` | GPTQ | yes | GPT only | no | no | N/A |
5050
| `diffusers` | SD,... | no | Image generation | no | no | N/A |
5151
| `vall-e-x` | Vall-E | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
5252
| `vllm` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
5353
| `exllama2` | GPTQ | yes | GPT only | no | no | N/A |
5454
| `transformers-musicgen` | | no | Audio generation | no | no | N/A |
55+
| [tinydream](https://github.com/symisc/tiny-dream#tiny-dreaman-embedded-header-only-stable-diffusion-inference-c-librarypixlabiotiny-dream) | stablediffusion | no | Image | no | no | N/A |
56+
| `coqui` | Coqui | no | Audio generation and Voice cloning | no | no | CPU/CUDA |
57+
| `petals` | Various GPTs and quantization formats | yes | GPT | no | no | CPU/CUDA |
5558

5659
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
5760

embedded/models/all-minilm-l6-v2.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
name: all-minilm-l6-v2
2+
backend: sentencetransformers
3+
embeddings: true
4+
parameters:
5+
model: all-MiniLM-L6-v2
6+
7+
usage: |
8+
You can test this model with curl like this:
9+
10+
curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
11+
"input": "Your text string goes here",
12+
"model": "all-minilm-l6-v2"
13+
}'

0 commit comments

Comments
 (0)