Skip to content

Commit 9f4a4fd

Browse files
authored
Add docs as follow on to #18362 (#18388)
1 parent 1123cfa commit 9f4a4fd

File tree

6 files changed

+735
-0
lines changed

6 files changed

+735
-0
lines changed
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Apertis AI (Stima API)
2+
3+
## Overview
4+
5+
| Property | Details |
6+
|-------|-------|
7+
| Description | Apertis AI (formerly Stima API) is a unified API platform providing access to 430+ AI models through a single interface, with cost savings of up to 50%. |
8+
| Provider Route on LiteLLM | `apertis/` |
9+
| Link to Provider Doc | [Apertis AI Website ↗](https://api.stima.tech) |
10+
| Base URL | `https://api.stima.tech/v1` |
11+
| Supported Operations | [`/chat/completions`](#sample-usage) |
12+
13+
<br />
14+
15+
## What is Apertis AI?
16+
17+
Apertis AI is a unified API platform that lets developers:
18+
- **Access 430+ AI Models**: All models through a single API
19+
- **Save 50% on Costs**: Competitive pricing with significant discounts
20+
- **Unified Billing**: Single bill for all model usage
21+
- **Quick Setup**: Start with just $2 registration
22+
- **GitHub Integration**: Link with your GitHub account
23+
24+
## Required Variables
25+
26+
```python showLineNumbers title="Environment Variables"
27+
os.environ["STIMA_API_KEY"] = "" # your Apertis AI API key
28+
```
29+
30+
Get your Apertis AI API key from [api.stima.tech](https://api.stima.tech).
31+
32+
## Usage - LiteLLM Python SDK
33+
34+
### Non-streaming
35+
36+
```python showLineNumbers title="Apertis AI Non-streaming Completion"
37+
import os
38+
import litellm
39+
from litellm import completion
40+
41+
os.environ["STIMA_API_KEY"] = "" # your Apertis AI API key
42+
43+
messages = [{"content": "What is the capital of France?", "role": "user"}]
44+
45+
# Apertis AI call
46+
response = completion(
47+
model="apertis/model-name", # Replace with actual model name
48+
messages=messages
49+
)
50+
51+
print(response)
52+
```
53+
54+
### Streaming
55+
56+
```python showLineNumbers title="Apertis AI Streaming Completion"
57+
import os
58+
import litellm
59+
from litellm import completion
60+
61+
os.environ["STIMA_API_KEY"] = "" # your Apertis AI API key
62+
63+
messages = [{"content": "Write a short poem about AI", "role": "user"}]
64+
65+
# Apertis AI call with streaming
66+
response = completion(
67+
model="apertis/model-name", # Replace with actual model name
68+
messages=messages,
69+
stream=True
70+
)
71+
72+
for chunk in response:
73+
print(chunk)
74+
```
75+
76+
## Usage - LiteLLM Proxy Server
77+
78+
### 1. Save key in your environment
79+
80+
```bash
81+
export STIMA_API_KEY=""
82+
```
83+
84+
### 2. Start the proxy
85+
86+
```yaml
87+
model_list:
88+
- model_name: apertis-model
89+
litellm_params:
90+
model: apertis/model-name # Replace with actual model name
91+
api_key: os.environ/STIMA_API_KEY
92+
```
93+
94+
## Supported OpenAI Parameters
95+
96+
Apertis AI supports all standard OpenAI-compatible parameters:
97+
98+
| Parameter | Type | Description |
99+
|-----------|------|-------------|
100+
| `messages` | array | **Required**. Array of message objects with 'role' and 'content' |
101+
| `model` | string | **Required**. Model ID from 430+ available models |
102+
| `stream` | boolean | Optional. Enable streaming responses |
103+
| `temperature` | float | Optional. Sampling temperature |
104+
| `top_p` | float | Optional. Nucleus sampling parameter |
105+
| `max_tokens` | integer | Optional. Maximum tokens to generate |
106+
| `frequency_penalty` | float | Optional. Penalize frequent tokens |
107+
| `presence_penalty` | float | Optional. Penalize tokens based on presence |
108+
| `stop` | string/array | Optional. Stop sequences |
109+
| `tools` | array | Optional. List of available tools/functions |
110+
| `tool_choice` | string/object | Optional. Control tool/function calling |
111+
112+
## Cost Benefits
113+
114+
Apertis AI offers significant cost advantages:
115+
- **50% Cost Savings**: Save money compared to direct provider costs
116+
- **Unified Billing**: Single invoice for all your AI model usage
117+
- **Low Entry**: Start with just $2 registration
118+
119+
## Model Availability
120+
121+
With access to 430+ AI models, Apertis AI provides:
122+
- Multiple providers through one API
123+
- Latest model releases
124+
- Various model types (text, image, video)
125+
126+
## Additional Resources
127+
128+
- [Apertis AI Website](https://api.stima.tech)
129+
- [Apertis AI Enterprise](https://api.stima.tech/enterprise)
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Chutes
2+
3+
## Overview
4+
5+
| Property | Details |
6+
|-------|-------|
7+
| Description | Chutes is a cloud-native AI deployment platform that allows you to deploy, run, and scale LLM applications with OpenAI-compatible APIs using pre-built templates for popular frameworks like vLLM and SGLang. |
8+
| Provider Route on LiteLLM | `chutes/` |
9+
| Link to Provider Doc | [Chutes Website ↗](https://chutes.ai) |
10+
| Base URL | `https://llm.chutes.ai/v1/` |
11+
| Supported Operations | [`/chat/completions`](#sample-usage), Embeddings |
12+
13+
<br />
14+
15+
## What is Chutes?
16+
17+
Chutes is a powerful AI deployment and serving platform that provides:
18+
- **Pre-built Templates**: Ready-to-use configurations for vLLM, SGLang, diffusion models, and embeddings
19+
- **OpenAI-Compatible APIs**: Use standard OpenAI SDKs and clients
20+
- **Multi-GPU Scaling**: Support for large models across multiple GPUs
21+
- **Streaming Responses**: Real-time model outputs
22+
- **Custom Configurations**: Override any parameter for your specific needs
23+
- **Performance Optimization**: Pre-configured optimization settings
24+
25+
## Required Variables
26+
27+
```python showLineNumbers title="Environment Variables"
28+
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
29+
```
30+
31+
Get your Chutes API key from [chutes.ai](https://chutes.ai).
32+
33+
## Usage - LiteLLM Python SDK
34+
35+
### Non-streaming
36+
37+
```python showLineNumbers title="Chutes Non-streaming Completion"
38+
import os
39+
import litellm
40+
from litellm import completion
41+
42+
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
43+
44+
messages = [{"content": "What is the capital of France?", "role": "user"}]
45+
46+
# Chutes call
47+
response = completion(
48+
model="chutes/model-name", # Replace with actual model name
49+
messages=messages
50+
)
51+
52+
print(response)
53+
```
54+
55+
### Streaming
56+
57+
```python showLineNumbers title="Chutes Streaming Completion"
58+
import os
59+
import litellm
60+
from litellm import completion
61+
62+
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
63+
64+
messages = [{"content": "Write a short poem about AI", "role": "user"}]
65+
66+
# Chutes call with streaming
67+
response = completion(
68+
model="chutes/model-name", # Replace with actual model name
69+
messages=messages,
70+
stream=True
71+
)
72+
73+
for chunk in response:
74+
print(chunk)
75+
```
76+
77+
## Usage - LiteLLM Proxy Server
78+
79+
### 1. Save key in your environment
80+
81+
```bash
82+
export CHUTES_API_KEY=""
83+
```
84+
85+
### 2. Start the proxy
86+
87+
```yaml
88+
model_list:
89+
- model_name: chutes-model
90+
litellm_params:
91+
model: chutes/model-name # Replace with actual model name
92+
api_key: os.environ/CHUTES_API_KEY
93+
```
94+
95+
## Supported OpenAI Parameters
96+
97+
Chutes supports all standard OpenAI-compatible parameters:
98+
99+
| Parameter | Type | Description |
100+
|-----------|------|-------------|
101+
| `messages` | array | **Required**. Array of message objects with 'role' and 'content' |
102+
| `model` | string | **Required**. Model ID or HuggingFace model identifier |
103+
| `stream` | boolean | Optional. Enable streaming responses |
104+
| `temperature` | float | Optional. Sampling temperature |
105+
| `top_p` | float | Optional. Nucleus sampling parameter |
106+
| `max_tokens` | integer | Optional. Maximum tokens to generate |
107+
| `frequency_penalty` | float | Optional. Penalize frequent tokens |
108+
| `presence_penalty` | float | Optional. Penalize tokens based on presence |
109+
| `stop` | string/array | Optional. Stop sequences |
110+
| `tools` | array | Optional. List of available tools/functions |
111+
| `tool_choice` | string/object | Optional. Control tool/function calling |
112+
| `response_format` | object | Optional. Response format specification |
113+
114+
## Support Frameworks
115+
116+
Chutes provides optimized templates for popular AI frameworks:
117+
118+
### vLLM (High-Performance LLM Serving)
119+
- OpenAI-compatible endpoints
120+
- Multi-GPU scaling support
121+
- Advanced optimization settings
122+
- Best for production workloads
123+
124+
### SGLang (Advanced LLM Serving)
125+
- Structured generation capabilities
126+
- Advanced features and controls
127+
- Custom configuration options
128+
- Best for complex use cases
129+
130+
### Diffusion Models (Image Generation)
131+
- Pre-configured image generation templates
132+
- Optimized settings for best results
133+
- Support for popular diffusion models
134+
135+
### Embedding Models
136+
- Text embedding templates
137+
- Vector search optimization
138+
- Support for popular embedding models
139+
140+
## Authentication
141+
142+
Chutes supports multiple authentication methods:
143+
- API Key via `X-API-Key` header
144+
- Bearer token via `Authorization` header
145+
146+
Example for LiteLLM (uses environment variable):
147+
```python
148+
os.environ["CHUTES_API_KEY"] = "your-api-key"
149+
```
150+
151+
## Performance Optimization
152+
153+
Chutes offers hardware selection and optimization:
154+
- **Small Models (7B-13B)**: 1 GPU with 24GB VRAM
155+
- **Medium Models (30B-70B)**: 4 GPUs with 80GB VRAM each
156+
- **Large Models (100B+)**: 8 GPUs with 140GB+ VRAM each
157+
158+
Engine optimization parameters available for fine-tuning performance.
159+
160+
## Deployment Options
161+
162+
Chutes provides flexible deployment:
163+
- **Quick Setup**: Use pre-built templates for instant deployment
164+
- **Custom Images**: Deploy with custom Docker images
165+
- **Scaling**: Configure max instances and auto-scaling thresholds
166+
- **Hardware**: Choose specific GPU types and configurations
167+
168+
## Additional Resources
169+
170+
- [Chutes Documentation](https://chutes.ai/docs)
171+
- [Chutes Getting Started](https://chutes.ai/docs/getting-started/running-a-chute)
172+
- [Chutes API Reference](https://chutes.ai/docs/sdk-reference)

0 commit comments

Comments
 (0)