The Chatterbox TTS API now includes a comprehensive voice library management system that allows users to upload, manage, and use custom voices across all speech generation endpoints. This feature enables you to create a persistent collection of voices that can be referenced by name in API calls.
- Persistent Voice Storage: Uploaded voices are stored persistently and survive container restarts
- Voice Selection by Name: Reference uploaded voices by name in any speech generation endpoint
- Multiple Audio Formats: Support for MP3, WAV, FLAC, M4A, and OGG files
- RESTful Voice Management: Full CRUD operations for voice management
- Docker & Local Support: Works seamlessly with both Docker and direct Python installations
- Frontend Integration: Complete voice management UI in the web frontend
The voice library is automatically configured when using Docker. Voices are stored in a persistent volume:
# Start with voice library enabled
docker-compose up -d
# Your voices will be persisted in the "chatterbox-voices" Docker volumeCreate a voice library directory (default: ./voices):
# Create voices directory
mkdir voices
# Or set custom location
export VOICE_LIBRARY_DIR="/path/to/your/voices"GET /v1/voices
Get a list of all voices in the library.
curl -X GET "http://localhost:4123/v1/voices"Response:
{
"voices": [
{
"name": "sarah_professional",
"filename": "sarah_professional.mp3",
"original_filename": "sarah_recording.mp3",
"file_extension": ".mp3",
"file_size": 1024768,
"upload_date": "2024-01-15T10:30:00Z",
"path": "/voices/sarah_professional.mp3"
}
],
"count": 1
}POST /v1/voices
Upload a new voice to the library.
curl -X POST "http://localhost:4123/v1/voices" \
-F "voice_name=sarah_professional" \
-F "voice_file=@/path/to/voice.mp3"Parameters:
voice_name(string): Name for the voice (used in API calls)voice_file(file): Audio file (MP3, WAV, FLAC, M4A, OGG, max 10MB)
DELETE /v1/voices/{voice_name}
Delete a voice from the library.
curl -X DELETE "http://localhost:4123/v1/voices/sarah_professional"PUT /v1/voices/{voice_name}
Rename an existing voice.
curl -X PUT "http://localhost:4123/v1/voices/sarah_professional" \
-F "new_name=sarah_business"GET /v1/voices/{voice_name}
Get detailed information about a specific voice.
curl -X GET "http://localhost:4123/v1/voices/sarah_professional"GET /v1/voices/{voice_name}/download
Download the original voice file.
curl -X GET "http://localhost:4123/v1/voices/sarah_professional/download" \
--output voice.mp3Use the voice name in the voice parameter:
curl -X POST "http://localhost:4123/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello! This is using my custom voice.",
"voice": "sarah_professional",
"exaggeration": 0.7,
"temperature": 0.8
}' \
--output speech.wavcurl -X POST "http://localhost:4123/v1/audio/speech/upload" \
-F "input=Hello! This is using my custom voice." \
-F "voice=sarah_professional" \
-F "exaggeration=0.7" \
--output speech.wavcurl -X POST "http://localhost:4123/v1/audio/speech/stream" \
-H "Content-Type: application/json" \
-d '{
"input": "This will stream with my custom voice.",
"voice": "sarah_professional"
}' \
--output stream.wav# Voice library directory (default: ./voices for local, /voices for Docker)
VOICE_LIBRARY_DIR=/path/to/voices
# For Docker, this is typically set to /voices and mounted as a volumeThe voice library is automatically configured in Docker with a persistent volume:
volumes:
- chatterbox-voices:/voices- Letters (a-z, A-Z)
- Numbers (0-9)
- Underscores (_)
- Hyphens (-)
- Spaces (converted to underscores)
- Forward/backward slashes (/, \)
- Colons (:)
- Asterisks (*)
- Question marks (?)
- Quotes (", ')
- Angle brackets (<, >)
- Pipes (|)
✅ Good names:
- "sarah_professional"
- "john-voice-2024"
- "female_american"
- "narration_style"
❌ Invalid names:
- "sarah/professional" # Contains slash
- "voice:sample" # Contains colon
- "my voice?" # Contains question mark- Use high-quality audio samples (16-48kHz sample rate)
- Aim for 10-30 seconds of clean speech
- Avoid background noise and music
- Choose samples with consistent volume
- Use descriptive voice names
- Keep file sizes reasonable (< 10MB)
- Organize voices by speaker or style
- Clean up unused voices periodically
- Use the JSON API for better performance
- Cache voice lists on the client side
- Handle voice-not-found errors gracefully
- Test voices before production use
{
"error": {
"message": "Voice 'my_voice' not found in voice library. Use /voices endpoint to list available voices.",
"type": "voice_not_found_error"
}
}Solution: Check available voices with GET /v1/voices or upload the voice first.
{
"error": {
"message": "Unsupported audio format: .txt. Supported formats: .mp3, .wav, .flac, .m4a, .ogg",
"type": "invalid_request_error"
}
}Solution: Use a supported audio format and ensure the file is valid.
{
"error": {
"message": "Voice 'sarah_professional' already exists",
"type": "voice_exists_error"
}
}Solution: Use a different name or delete the existing voice first.
The web frontend includes a complete voice library management interface:
- Voice Library Panel: Browse and manage voices
- Upload Modal: Easy voice upload with drag-and-drop
- Voice Selection: Choose voices in the TTS interface
- Preview Playback: Listen to voice samples before use
- Rename/Delete: Manage voice metadata
If you were previously using the client-side voice library (localStorage), you'll need to re-upload your voices to the new server-side library for persistence and cross-device access.
All voice endpoints support multiple URL formats:
/v1/voices(recommended)/voices/voice-library/voice_library
The voice parameter also accepts OpenAI voice names for compatibility:
alloy,echo,fable,onyx,nova,shimmer
These will use the default configured voice sample, while custom names will use uploaded voices from the library.
- Voice files are stored on the server filesystem
- File uploads are validated for type and size
- Voice names are sanitized to prevent path traversal
- No authentication required (same as other endpoints)
- Voice library operations are fast (< 100ms typical)
- Voice files are loaded on-demand for TTS generation
- Large voice files may increase TTS processing time
- Consider voice file size vs. quality trade-offs
Planned features for future releases:
- Voice categorization and tagging
- Bulk voice operations
- Voice sharing between users
- Advanced voice metadata
- Voice quality analysis
- Automatic voice optimization