Skip to content
This repository was archived by the owner on May 10, 2024. It is now read-only.

Commit e23fe3c

Browse files
authored
Merge pull request #152 from chroma-core/folders
Folders
2 parents 4df8ecc + fd452ce commit e23fe3c

17 files changed

+511
-235
lines changed

docs/api/index.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
slug: /api
3+
title: 🔧 API
4+
---
5+
6+
# 🔧 API
7+
8+
## Client APIs
9+
10+
Chroma currently maintains 1st party clients for Python and Javscript. For other clients in other languages, use their repos for documentation.
11+
12+
`Client` - is the object that wraps a connection to a backing Chroma DB
13+
14+
`Collection` - is the object that wraps a collectiom
15+
16+
17+
<div class="special_table"></div>
18+
19+
| | Client | Collection |
20+
|--------------|-----------|---------------|
21+
| Python | [Client](/reference/Client) | [Collection](/reference/Collection) |
22+
| Javascript | [Client](/js_reference/Client) | [Collection](/reference/Collection) |
23+
24+
***
25+
26+
## Backend API
27+
28+
Chroma's backend Swagger REST API docs are viewable by running Chroma and navigating to `http://localhost:8000/docs`.
29+
30+
```
31+
pip install chromadb
32+
chroma run
33+
open http://localhost:8000/docs
34+
```

docs/embeddings.md

Lines changed: 22 additions & 224 deletions
Original file line numberDiff line numberDiff line change
@@ -4,53 +4,42 @@ sidebar_position: 4
44

55
# 🧬 Embeddings
66

7-
import Tabs from '@theme/Tabs';
8-
import TabItem from '@theme/TabItem';
9-
10-
<div class="select-language">Select a language</div>
11-
12-
<Tabs queryString groupId="lang">
13-
<TabItem value="py" label="Python"></TabItem>
14-
<TabItem value="js" label="JavaScript"></TabItem>
15-
</Tabs>
16-
17-
***
18-
197
Embeddings are the A.I-native way to represent any kind of data, making them the perfect fit for working with all kinds of A.I-powered tools and algorithms. They can represent text, images, and soon audio and video. There are many options for creating embeddings, whether locally using an installed library, or by calling an API.
208

219
Chroma provides lightweight wrappers around popular embedding providers, making it easy to use them in your apps. You can set an embedding function when you create a Chroma collection, which will be used automatically, or you can call them directly yourself.
2210

23-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
24-
<TabItem value="py" label="Python">
11+
<div class="special_table"></div>
2512

26-
To get Chroma's embedding functions, import the `chromadb.utils.embedding_functions` module.
13+
| | Python | JS |
14+
|--------------|-----------|---------------|
15+
| [Default](/integrations/langchain) |||
16+
| [OpenAI](/integrations/langchain) |||
17+
| [Cohere](/integrations/llama-index) |||
18+
| [Google PaLM](/integrations/llama-index) |||
19+
| [Hugging Face](/integrations/llama-index) |||
20+
| [Instructor](/integrations/llama-index) |||
2721

28-
```python
29-
from chromadb.utils import embedding_functions
30-
```
22+
We welcome pull requests to add new Embedding Functions to the community.
3123

24+
***
3225

3326
## Default: all-MiniLM-L6-v2
3427

3528
By default, Chroma uses the [Sentence Transformers](https://www.sbert.net/) `all-MiniLM-L6-v2` model to create embeddings. This embedding model can create sentence and document embeddings that can be used for a wide variety of tasks. This embedding function runs locally on your machine, and may require you download the model files (this will happen automatically).
3629

3730
```python
31+
from chromadb.utils import embedding_functions
3832
default_ef = embedding_functions.DefaultEmbeddingFunction()
3933
```
4034

41-
:::tip
42-
Embedding functions can linked to a collection, which are used whenever you call `add`, `update`, `upsert` or `query`. You can also call them directly which can be handy for debugging.
35+
:::note
36+
Embedding functions can linked to a collection, which are used whenever you call `add`, `update`, `upsert` or `query`. You can also be use them directly which can be handy for debugging.
4337
```py
4438
val = default_ef(["foo"])
4539
```
4640
-> [[0.05035809800028801, 0.0626462921500206, -0.061827320605516434...]]
4741
:::
4842

49-
</TabItem>
50-
51-
52-
<TabItem value="js" label="JavaScript">
53-
5443

5544
<!--
5645
## Transformers.js
@@ -83,9 +72,6 @@ const embedder = new TransformersEmbeddingFunction();
8372
8473
``` -->
8574

86-
</TabItem>
87-
</Tabs>
88-
8975
<Tabs queryString groupId="lang" className="hideTabSwitcher">
9076
<TabItem value="py" label="Python">
9177

@@ -105,208 +91,21 @@ You can pass in an optional `model_name` argument, which lets you choose which S
10591
</Tabs>
10692

10793

108-
## OpenAI
109-
110-
Chroma provides a convenient wrapper around OpenAI's embedding API. This embedding function runs remotely on OpenAI's servers, and requires an API key. You can get an API key by signing up for an account at [OpenAI](https://openai.com/api/).
111-
112-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
113-
<TabItem value="py" label="Python">
114-
115-
This embedding function relies on the `openai` python package, which you can install with `pip install openai`.
116-
117-
```python
118-
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
119-
api_key="YOUR_API_KEY",
120-
model_name="text-embedding-ada-002"
121-
)
122-
```
123-
124-
To use the OpenAI embedding models on other platforms such as Azure, you can use the `api_base` and `api_type` parameters:
125-
```python
126-
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
127-
api_key="YOUR_API_KEY",
128-
api_base="YOUR_API_BASE_PATH",
129-
api_type="azure",
130-
api_version="YOUR_API_VERSION",
131-
model_name="text-embedding-ada-002"
132-
)
133-
```
134-
135-
</TabItem>
136-
<TabItem value="js" label="JavaScript">
137-
138-
```javascript
139-
//CJS
140-
const {OpenAIEmbeddingFunction} = require('chromadb');
141-
142-
//ESM
143-
import {OpenAIEmbeddingFunction} from 'chromadb'
144-
145-
146-
const embedder = new OpenAIEmbeddingFunction({openai_api_key: "apiKey"})
147-
148-
// use directly
149-
const embeddings = embedder.generate(["document1","document2"])
150-
151-
// pass documents to query for .add and .query
152-
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
153-
const collection = await client.getCollection({name: "name", embeddingFunction: embedder})
154-
```
155-
156-
</TabItem>
157-
158-
</Tabs>
159-
160-
161-
You can pass in an optional `model_name` argument, which lets you choose which OpenAI embeddings model to use. By default, Chroma uses `text-embedding-ada-002`. You can see a list of all available models [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).
162-
163-
## Cohere
164-
165-
Chroma also provides a convenient wrapper around Cohere's embedding API. This embedding function runs remotely on Cohere’s servers, and requires an API key. You can get an API key by signing up for an account at [Cohere](https://dashboard.cohere.ai/welcome/register).
166-
167-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
168-
<TabItem value="py" label="Python">
169-
170-
This embedding function relies on the `cohere` python package, which you can install with `pip install cohere`.
171-
172-
```python
173-
cohere_ef = embedding_functions.CohereEmbeddingFunction(api_key="YOUR_API_KEY", model_name="large")
174-
cohere_ef(texts=["document1","document2"])
175-
```
176-
177-
</TabItem>
178-
<TabItem value="js" label="JavaScript">
179-
180-
```javascript
181-
//CJS
182-
const {CohereEmbeddingFunction} = require('chromadb');
183-
184-
//ESM
185-
import {CohereEmbeddingFunction} from 'chromadb'
186-
187-
const embedder = new CohereEmbeddingFunction("apiKey")
188-
189-
// use directly
190-
const embeddings = embedder.generate(["document1","document2"])
191-
192-
// pass documents to query for .add and .query
193-
const collection = await client.createCollection({name: "name", embeddingFunction: embedder})
194-
const collectionGet = await client.getCollection({name:"name", embeddingFunction: embedder})
195-
```
196-
197-
</TabItem>
198-
199-
</Tabs>
200-
201-
202-
203-
You can pass in an optional `model_name` argument, which lets you choose which Cohere embeddings model to use. By default, Chroma uses `large` model. You can see the available models under `Get embeddings` section [here](https://docs.cohere.ai/reference/embed).
204-
205-
### Multilingual model example
206-
207-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
208-
<TabItem value="py" label="Python">
209-
210-
```python
211-
cohere_ef = embedding_functions.CohereEmbeddingFunction(
212-
api_key="YOUR_API_KEY",
213-
model_name="multilingual-22-12")
214-
215-
multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
216-
'Hallo von Cohere!', 'Bonjour de Cohere!',
217-
'¡Hola desde Cohere!', 'Olá do Cohere!',
218-
'Ciao da Cohere!', '您好,来自 Cohere!',
219-
'कोहेरे से नमस्ते!' ]
220-
221-
cohere_ef(texts=multilingual_texts)
222-
223-
```
224-
225-
</TabItem>
226-
<TabItem value="js" label="JavaScript">
227-
228-
```javascript
229-
//CJS
230-
const {CohereEmbeddingFunction} = require('chromadb');
231-
232-
//ESM
233-
import {CohereEmbeddingFunction} from 'chromadb'
234-
235-
const embedder = new CohereEmbeddingFunction("apiKey")
236-
237-
multilingual_texts = [ 'Hello from Cohere!', 'مرحبًا من كوهير!',
238-
'Hallo von Cohere!', 'Bonjour de Cohere!',
239-
'¡Hola desde Cohere!', 'Olá do Cohere!',
240-
'Ciao da Cohere!', '您好,来自 Cohere!',
241-
'कोहेरे से नमस्ते!' ]
242-
243-
const embeddings = embedder.generate(multilingual_texts)
244-
245-
```
246-
247-
248-
</TabItem>
249-
250-
</Tabs>
251-
252-
253-
254-
For more information on multilingual model you can read [here](https://docs.cohere.ai/docs/multilingual-language-models).
255-
256-
## Instructor models
257-
258-
The [instructor-embeddings](https://github.com/HKUNLP/instructor-embedding) library is another option, especially when running on a machine with a cuda-capable GPU. They are a good local alternative to OpenAI (see the [Massive Text Embedding Benchmark](https://huggingface.co/blog/mteb) rankings). The embedding function requires the InstructorEmbedding package. To install it, run ```pip install InstructorEmbedding```.
259-
260-
There are three models available. The default is `hkunlp/instructor-base`, and for better performance you can use `hkunlp/instructor-large` or `hkunlp/instructor-xl`. You can also specify whether to use `cpu` (default) or `cuda`. For example:
261-
262-
```python
263-
#uses base model and cpu
264-
ef = embedding_functions.InstructorEmbeddingFunction()
265-
```
266-
or
267-
```python
268-
ef = embedding_functions.InstructorEmbeddingFunction(
269-
model_name="hkunlp/instructor-xl", device="cuda")
270-
```
271-
Keep in mind that the large and xl models are 1.5GB and 5GB respectively, and are best suited to running on a GPU.
272-
273-
## Google PaLM API models
274-
275-
[Google PaLM APIs](https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html) are currently in private preview, but if you are part of this preview, you can use them with Chroma via the `GooglePalmEmbeddingFunction`.
276-
277-
To use the PaLM embedding API, you must have `google.generativeai` Python package installed and have the API key. To use:
278-
279-
```python
280-
palm_embedding = embedding_functions.GooglePalmEmbeddingFunction(
281-
api_key=api_key, model=model_name)
282-
283-
```
284-
285-
## HuggingFace
286-
287-
Chroma also provides a convenient wrapper around HuggingFace's embedding API. This embedding function runs remotely on HuggingFace's servers, and requires an API key. You can get an API key by signing up for an account at [HuggingFace](https://huggingface.co/).
94+
***
28895

289-
<Tabs queryString groupId="lang" className="hideTabSwitcher">
290-
<TabItem value="py" label="Python">
29196

292-
This embedding function relies on the `requests` python package, which you can install with `pip install requests`.
97+
## Custom Embedding Functions
29398

294-
```python
295-
huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
296-
api_key="YOUR_API_KEY",
297-
model_name="sentence-transformers/all-MiniLM-L6-v2"
298-
)
299-
```
99+
import Tabs from '@theme/Tabs';
100+
import TabItem from '@theme/TabItem';
300101

301-
You can pass in an optional `model_name` argument, which lets you choose which HuggingFace model to use. By default, Chroma uses `sentence-transformers/all-MiniLM-L6-v2`. You can see a list of all available models [here](https://huggingface.co/models).
102+
<div class="select-language">Select a language</div>
302103

303-
</TabItem>
304-
<TabItem value="js" label="JavaScript">
305-
</TabItem>
104+
<Tabs queryString groupId="lang">
105+
<TabItem value="py" label="Python"></TabItem>
106+
<TabItem value="js" label="JavaScript"></TabItem>
306107
</Tabs>
307108

308-
## Custom Embedding Functions
309-
310109
<Tabs queryString groupId="lang" className="hideTabSwitcher">
311110
<TabItem value="py" label="Python">
312111

@@ -348,4 +147,3 @@ class MyEmbeddingFunction {
348147

349148
</Tabs>
350149

351-
We welcome pull requests to add new Embedding Functions to the community.

0 commit comments

Comments
 (0)