-
Notifications
You must be signed in to change notification settings - Fork 248
RC: LangCache public preview #1703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from 6 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
8deb2a9
page structure/overview/create service text (no images)
cmilesb 03d8a95
checkpoint
cmilesb 453c249
Monitor cache
cmilesb 697a693
View/Edit
cmilesb 5d86971
Images
cmilesb fbe59e6
Merge branch 'main' into DOC-5172
cmilesb 3f2566c
Update create-service.md
Jenverse 94f6ac9
Update use-langcache.md
Jenverse 639ce64
Update view-edit-cache.md
Jenverse 412b8ed
Merge pull request #1709 from redis/Jenverse-patch-1-1
cmilesb eef63da
Apply suggestions from code review
cmilesb 9bb10e5
Merge pull request #1710 from redis/Jenverse-patch-1-2
cmilesb 05a98bc
Merge pull request #1711 from redis/Jenverse-patch-1-3
cmilesb 54b055b
Apply suggestions from code review
cmilesb ce25f26
Update content/operate/rc/langcache/view-edit-cache.md
cmilesb 718eba2
Fix name on an image
cmilesb dd5295a
Merge branch 'main' into DOC-5172
cmilesb 0000190
Get up-to-date
cmilesb 326f48a
Add LLM cost reduction
cmilesb 72403d0
Typo
cmilesb 6856a1d
Merge branch 'main' into DOC-5172
cmilesb 8baad94
Fix formula code block
cmilesb 3556aa4
Move all 2024 changelogs to 2024 folder and fix a few things
cmilesb 215e93e
Add aliases
cmilesb 442df0c
Merge branch 'main' into DOC-5405
cmilesb 2e3cae4
Merge branch 'main' into DOC-5172
cmilesb 2776f2f
Merge branch 'main' into DOC-5172
cmilesb 72c5045
LangCache: Move certain docs to develop/ai
cmilesb 6247f38
Apply suggestions from code review
cmilesb f442476
Add LangCache SDK
cmilesb 1c2dbef
Add SDK in title and LinkTitle
cmilesb 2c6c5ef
Merge pull request #1798 from redis/DOC-5172-move-docs
cmilesb 04668ee
Merge branch 'main' into DOC-5405
cmilesb ae16c7a
Merge branch 'main' into DOC-5405
cmilesb 17e5695
Merge branch 'main' into DOC-5172
cmilesb 5e728cb
Merge pull request #1774 from redis/DOC-5405
cmilesb d1cdf72
Changelog
cmilesb 30a0b42
Merge branch 'main' into DOC-5172
cmilesb 02abe25
Add entries to search endpoint
cmilesb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
alwaysopen: false | ||
categories: | ||
- docs | ||
- operate | ||
- rc | ||
description: Store LLM responses for AI applications in Redis Cloud. | ||
hideListLinks: true | ||
linktitle: LangCache | ||
title: Semantic caching with LangCache | ||
weight: 36 | ||
--- | ||
|
||
LangCache is a semantic caching service available as a REST API that stores LLM responses for fast and cheaper retrieval, built on the Redis vector database. By using semantic caching, customers can significantly reduce API costs and lower the average latency of their generative AI applications. | ||
|
||
## LangCache overview | ||
|
||
LangCache uses semantic caching to store and reuse previous LLM responses for repeated queries. Instead of calling the LLM for every request, LangCache checks if a similar response has already been generated and is stored in the cache. If a match is found, LangCache returns the cached response instantly, saving time and resources. | ||
|
||
Imagine you’re using an LLM to build an agent to answer questions about your company's products. Your users may ask questions like the following: | ||
|
||
- "What are the features of Product A?" | ||
- "Can you list the main features of Product A?" | ||
- "Tell me about Product A’s features." | ||
|
||
These prompts may have slight variations, but they essentially ask the same question. LangCache can help you avoid calling the LLM for each of these prompts by caching the response to the first prompt and returning it for any similar prompts. | ||
|
||
Using LangCache as a semantic caching service in Redis Cloud has the following benefits: | ||
|
||
- **Lower LLM costs**: Reduce costly LLM calls by easily storing the most frequently-requested responses. | ||
- **Faster AI app responses**: Get faster AI responses by retrieving previously-stored requests from memory. | ||
- **Simpler Deployments**: Access our managed service via a REST API with automated embedding generation, configurable controls. | ||
- **Advanced cache management**: Manage data access and privacy, eviction protocols, and monitor usage and cache hit rates. | ||
|
||
## LangCache architecture | ||
|
||
The following diagram displays how you can integrate LangCache into your GenAI app: | ||
|
||
{{< image filename="images/rc/langcache-process.png" >}} | ||
|
||
1. A user sends a prompt to your AI app. | ||
1. Your app sends the prompt to LangCache through the `POST /v1/caches/{cacheId}/search` endpoint. | ||
1. LangCache calls an embedding model service to generate an embedding for the prompt. | ||
1. LangCache searches the cache to see if a similar response already exists by matching the embeddings of the new query with the stored embeddings. | ||
1. If a semantically similar entry is found (also known as a cache hit), LangCache gets the cached response and returns it to your app. Your app can then send the cached response back to the user. | ||
1. If no match is found (also known as a cache miss), your app receives an empty response from LangCache. Your app then queries your chosen LLM to generate a new response. | ||
1. Your app sends the prompt and the new response to LangCache through the `POST /v1/caches/{cacheId}/entries` endpoint. | ||
1. LangCache stores the embedding with the new response in the cache for future use. | ||
|
||
## Get started with LangCache on Redis Cloud | ||
|
||
To set up LangCache on Redis Cloud: | ||
|
||
1. [Create a database]({{< relref "/operate/rc/databases/create-database" >}}) on Redis Cloud. | ||
2. [Create a LangCache service]({{< relref "/operate/rc/langcache/create-service" >}}) for your database. | ||
3. [Use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app. | ||
|
||
After you set up LangCache, you can [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
--- | ||
alwaysopen: false | ||
categories: | ||
- docs | ||
- operate | ||
- rc | ||
description: null | ||
hideListLinks: true | ||
linktitle: Create service | ||
title: Create a LangCache service | ||
weight: 5 | ||
--- | ||
|
||
Redis LangCache provides vector search capabilities and efficient caching for AI-powered applications. This guide walks you through creating and configuring a LangCache service in Redis Cloud. | ||
|
||
## Prerequisites | ||
|
||
To create a LangCache service, you will need: | ||
|
||
- A Redis Cloud database. If you don't have one, see [Create a database]({{< relref "/operate/rc/databases/create-database" >}}). | ||
{{< note >}} | ||
LangCache does not support the following databases during public preview: | ||
- Databases with a [CIDR allow list]({{< relref "/operate/rc/security/cidr-whitelist" >}}) | ||
- [Active-Active]({{< relref "/operate/rc/databases/configuration/active-active-redis" >}}) databases | ||
- Databases with the [default user]({{< relref "/operate/rc/security/access-control/data-access-control/default-user" >}}) turned off | ||
{{< /note >}} | ||
- An [OpenAI API key](https://platform.openai.com/api-keys). LangCache uses OpenAI to generate embeddings for prompts and responses. | ||
|
||
## Create a LangCache service | ||
|
||
1. From the [Redis Cloud console](https://cloud.redis.io/), select **LangCache AI** from the left-hand menu. | ||
|
||
1. When you access the LangCache AI page for the first time, you will see a page with an introduction to LangCache. Select **Let's create a service** to create your first service. | ||
|
||
{{<image filename="images/rc/langcache-create-first-service.png" alt="The Let's create a service button." width="200px" >}} | ||
|
||
If you have already created a LangCache service, select **New service** to create another one. | ||
|
||
{{<image filename="images/rc/langcache-new-service.png" alt="The New service button." width="150px" >}} | ||
|
||
This takes you to the **Create LangCache service** page. The page is divided into the following sections: | ||
|
||
1. The [General settings](#general-settings) section defines basic properties of your service. | ||
1. The [Embedding settings](#embedding-settings) section defines the embedding model used by your service. | ||
1. The [Attributes settings](#attributes-settings) section allows you to define attributes for your service. | ||
|
||
### General settings | ||
|
||
The **General settings** section defines basic properties of your service. | ||
|
||
{{<image filename="images/rc/langcache-general-settings.png" alt="The General settings section." >}} | ||
|
||
| Setting name |Description| | ||
|:----------------------|:----------| | ||
| **Service name** | Enter a name for your LangCache service. We recommend you use a name that describes your service's purpose. | | ||
| **Select database** | Select the Redis Cloud database to use for this service from the list. | | ||
| **TTL** | The number of seconds to cache entries before they expire. Default: `No expiration` - items in the cache will remain until manually removed. | | ||
| **User** | The [database access user]({{< relref "/operate/rc/security/access-control/data-access-control/role-based-access-control" >}}) to use for this service. LangCache only supports the [`default` user]({{< relref "/operate/rc/security/access-control/data-access-control/default-user" >}}) during public preview. | | ||
|
||
### Embedding settings | ||
|
||
The **Embedding settings** section defines the embedding model used by your service. | ||
|
||
{{<image filename="images/rc/langcache-embedding-settings.png" alt="The Embedding settings section." >}} | ||
|
||
| Setting name |Description| | ||
|:----------------------|:----------| | ||
| **Supported Embedding Provider** | The embedding provider to use for your service. LangCache only supports OpenAI during public preview. | | ||
| **Embedding provider API key** | Enter your embedding provider's API key. | | ||
| **Model** | Select the embedding model to use for your service. | | ||
| **Similarity threshold** | Set the minimum similarity score required to consider a cached response a match. Range: `0.0` to `1.0`. Default: `0.9`<br/>A higher value means more precise matches, but if it's too high, you will compromise on the number of matches and may lose relevant matches. A lower value means more matches, but may include less relevant matches. We recommend starting between `0.8` and `0.9` and then fine-tuning based on your results. | | ||
|
||
### Attributes settings | ||
|
||
Attributes provide powerful scoping capabilities for your LangCache operations. Think of them as tags or labels that help you organize and manage your cached data with precision. | ||
|
||
The **Attributes settings** section allows you to define attributes for your service. It is collapsed by default. | ||
|
||
{{<image filename="images/rc/langcache-attribute-settings.png" alt="The Attributes settings section, expanded." >}} | ||
|
||
By default, LangCache includes three fixed attributes, which are called the "scope" in the API: | ||
- User ID: Scope operations to a specific user | ||
- Application ID: Manage cache entries related to a particular application | ||
- Session ID: Control cache for a specific user session | ||
|
||
These fixed attributes enable targeted cache operations. For example, you'll be able to: | ||
- Delete all cached entries for a specific user who updated their preferences | ||
- Scope search results to only return entries relevant to the current user | ||
- Clear all cached data related to a particular application version | ||
|
||
Beyond these fixed attributes, LangCache allows you to define up to 5 custom attributes that align with your specific use case. To add a new attribute: | ||
|
||
1. Select **Add attribute**. | ||
|
||
{{<image filename="images/rc/langcache-add-attribute.png" alt="The Add attribute button." width="150px" >}} | ||
|
||
1. Give your custom attribute a descriptive name and select the check mark button to save it. | ||
|
||
{{<image filename="images/rc/langcache-custom-attributes.png" alt="The custom attributes section. Select the Confirm add attribute button to save your attribute." >}} | ||
|
||
After you save your custom attribute, it will appear in the list of custom attributes. Use the **Delete** button to remove it. | ||
|
||
{{<image filename="images/rc/icon-delete-teal.png" width="36px" alt="Select the Delete button to delete the selected attribute." >}} | ||
|
||
You can also select **Add attribute** again to add an additional attribute. | ||
|
||
{{<image filename="images/rc/langcache-add-attribute.png" alt="The Add attribute button." width="150px" >}} | ||
|
||
### Create service | ||
|
||
When you are done setting the details of your LangCache service, select **Create** to create it. | ||
|
||
{{<image filename="images/rc/button-access-management-user-key-create.png" alt="Use the Create button to create a LangCache service." >}} | ||
|
||
You'll be taken to your LangCache service's **Configuration** page. You'll also be able to see your LangCache service in the LangCache service list. | ||
|
||
{{<image filename="images/rc/langcache-service-list.png" alt="The LangCache service in the LangCache service list." >}} | ||
|
||
If an error occurs, verify that: | ||
- Your database is active. | ||
- You have provided a valid OpenAI API key. | ||
- You have provided valid values for all the required fields. | ||
|
||
For help, [contact support](https://redis.io/support/). | ||
|
||
## Next steps | ||
|
||
After your cache is created, you can [use the LangCache API]({{< relref "/operate/rc/langcache/use-langcache" >}}) from your client app. | ||
|
||
You can also [view and edit the cache]({{< relref "/operate/rc/langcache/view-edit-cache" >}}) and [monitor the cache's performance]({{< relref "/operate/rc/langcache/monitor-cache" >}}). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
alwaysopen: false | ||
categories: | ||
- docs | ||
- operate | ||
- rc | ||
description: null | ||
hideListLinks: true | ||
linktitle: Monitor cache | ||
title: Monitor a LangCache service | ||
weight: 20 | ||
--- | ||
|
||
You can monitor a LangCache service's performance from the **Metrics** tab of the service's page. | ||
|
||
{{<image filename="images/rc/langcache-metrics.png" alt="The metrics tab of the LangCache service's page." >}} | ||
|
||
The **Metrics** tab provides a series of graphs showing performance data for your LangCache service. | ||
|
||
You can switch between daily and weekly stats using the **Day** and **Week** buttons at the top of the page. Each graph also includes minimum, average, maximum, and latest values. | ||
|
||
## LangCache metrics reference | ||
|
||
### Cache hit ratio | ||
|
||
The percentage of requests that were successfully served from the cache without needing to call the LLM API. A healthy cache will generally show an increasing hit ratio over time as it becomes more populated by cached responses. | ||
|
||
To optimize your cache hit ratio: | ||
|
||
- Tune similarity thresholds to capture more semantically related queries. | ||
- Analyze recurring query patterns to fine-tune your embedding strategies. | ||
- Test different embedding models to understand their impact on cache hit rates. | ||
|
||
A higher cache hit ratio does not always mean better performance. If the cache is too lenient in its similarity matching, it may return irrelevant responses, leading to a higher cache hit rate but poorer overall performance. | ||
|
||
### Cache search requests | ||
|
||
The number of read attempts against the cache at the specified time. This metric can help you understand the load on your cache and identify periods of high or low activity. | ||
|
||
### Cache latency | ||
|
||
The average time to process a cache lookup request. This metric can help you identify performance bottlenecks and optimize your cache configuration. | ||
|
||
Cache latency is highly dependent on embedding model performance, since the cache must generate embeddings for each request in order to compare them to the cached responses. | ||
|
||
High cache latency may indicate one of the following: | ||
|
||
- Inefficient embedding generation from the embedding provider | ||
- Large cache requiring longer comparison times | ||
- Network latency between the cache and embedding provider | ||
- Resource constraints | ||
|
||
### Cache items | ||
|
||
The total number of entries stores in your cache. Each item includes the query string, embedding, response, and other metadata. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
--- | ||
alwaysopen: false | ||
categories: | ||
- docs | ||
- operate | ||
- rc | ||
description: null | ||
hideListLinks: true | ||
linktitle: Use LangCache | ||
title: Use the LangCache API with your GenAI app | ||
weight: 10 | ||
--- | ||
|
||
You can use the LangCache API from your client app to store and retrieve LLM responses. | ||
|
||
To access the LangCache API, you need: | ||
|
||
- LangCache API base URL | ||
- LangCache API token | ||
- Cache ID | ||
|
||
All of these values are available in the LangCache service's **Configuration** page. | ||
|
||
When you call the API, you need to pass the LangCache API token in the `Authorization` header as a Bearer token and the Cache ID as the `cacheId` path parameter. | ||
|
||
For example, to check the health of the cache using `cURL`: | ||
|
||
```bash | ||
curl -s -X GET "https://$HOST/v1/caches/$CACHE_ID/health" \ | ||
-H "accept: application/json" \ | ||
-H "Authorization: Bearer $API_KEY" | ||
``` | ||
|
||
- The example expects several variables to be set in the shell: | ||
|
||
- **$HOST** - the LangCache API base URL | ||
- **$CACHE_ID** - the Cache ID of your cache | ||
- **$API_KEY** - The LangCache API token | ||
|
||
{{% info %}} | ||
This example uses `cURL` and Linux shell scripts to demonstrate the API; you can use any standard REST client or library. | ||
{{% /info %}} | ||
|
||
## Check cache health | ||
|
||
Use `GET /v1/caches/{cacheId}/health` to check the health of the cache. | ||
|
||
```sh | ||
GET https://[host]/v1/caches/{cacheId}/health | ||
``` | ||
|
||
## Search LangCache for similar responses | ||
|
||
Use `POST /v1/caches/{cacheId}/search` to search the cache for matching responses to a user prompt. | ||
|
||
```sh | ||
POST https://[host]/v1/caches/{cacheId}/search | ||
{ | ||
"prompt": "User prompt text" | ||
} | ||
``` | ||
|
||
Place this call in your client app right before you call your LLM's REST API. If LangCache returns a response, you can send that response back to the user instead of calling the LLM. | ||
|
||
If LangCache does not return a response, you should call your LLM's REST API to generate a new response. After you get a response from the LLM, you can [store it in LangCache](#store-a-new-response-in-langcache) for future use. | ||
|
||
You can also limit the responses returned from LangCache by adding an `attributes` object or `scope` object to the request. LangCache will only return responses that match the attributes you specify. | ||
|
||
```sh | ||
POST https://[host]/v1/caches/{cacheId}/search | ||
{ | ||
"prompt": "User prompt text", | ||
"attributes": { | ||
"customAttributeName": "customAttributeValue" | ||
}, | ||
"scope": { | ||
"applicationId": "applicationId", | ||
"userId": "userId", | ||
"sessionId": "sessionId" | ||
} | ||
} | ||
``` | ||
|
||
## Store a new response in LangCache | ||
|
||
Use `POST /v1/caches/{cacheId}/entries` to store a new response in the cache. | ||
|
||
```sh | ||
POST https://[host]/v1/caches/{cacheId}/entries | ||
{ | ||
"prompt": "User prompt text", | ||
"response": "LLM response text" | ||
} | ||
``` | ||
|
||
Place this call in your client app after you get a response from the LLM. This will store the response in the cache for future use. | ||
|
||
You can also store the responses with custom attributes by adding an `attributes` object to the request. To store a response with one or more of the default attributes, use the `scope` object instead. | ||
|
||
```sh | ||
POST https://[host]/v1/caches/{cacheId}/entries | ||
{ | ||
"prompt": "User prompt text", | ||
"response": "LLM response text", | ||
"attributes": { | ||
"customAttributeName": "customAttributeValue" | ||
}, | ||
"scope": { | ||
"applicationId": "applicationId", | ||
"userId": "userId", | ||
"sessionId": "sessionId" | ||
} | ||
} | ||
``` | ||
|
||
## Delete cached responses | ||
|
||
Use `DELETE /v1/caches/{cacheId}/entries/{entryId}` to delete a cached response from the cache. | ||
|
||
You can also use `DELETE /v1/caches/{cacheId}/entries` to delete multiple cached responses at once. If you provide an `attributes` object or `scope` object, LangCache will delete all responses that match the attributes you specify. | ||
cmilesb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```sh | ||
DELETE https://[host]/v1/caches/{cacheId}/entries | ||
{ | ||
"attributes": { | ||
"customAttributeName": "customAttributeValue" | ||
}, | ||
"scope": { | ||
"applicationId": "applicationId", | ||
"userId": "userId", | ||
"sessionId": "sessionId" | ||
} | ||
} | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.