Skip to content

Latest commit

 

History

History
344 lines (243 loc) · 7.86 KB

File metadata and controls

344 lines (243 loc) · 7.86 KB

Voice Library Management

🎵 Overview

The Chatterbox TTS API now includes a comprehensive voice library management system that allows users to upload, manage, and use custom voices across all speech generation endpoints. This feature enables you to create a persistent collection of voices that can be referenced by name in API calls.

✨ Key Features

  • Persistent Voice Storage: Uploaded voices are stored persistently and survive container restarts
  • Voice Selection by Name: Reference uploaded voices by name in any speech generation endpoint
  • Multiple Audio Formats: Support for MP3, WAV, FLAC, M4A, and OGG files
  • RESTful Voice Management: Full CRUD operations for voice management
  • Docker & Local Support: Works seamlessly with both Docker and direct Python installations
  • Frontend Integration: Complete voice management UI in the web frontend

🚀 Getting Started

For Docker Users

The voice library is automatically configured when using Docker. Voices are stored in a persistent volume:

# Start with voice library enabled
docker-compose up -d

# Your voices will be persisted in the "chatterbox-voices" Docker volume

For Local Python Users

Create a voice library directory (default: ./voices):

# Create voices directory
mkdir voices

# Or set custom location
export VOICE_LIBRARY_DIR="/path/to/your/voices"

📚 API Endpoints

List Voices

GET /v1/voices

Get a list of all voices in the library.

curl -X GET "http://localhost:4123/v1/voices"

Response:

{
  "voices": [
    {
      "name": "sarah_professional",
      "filename": "sarah_professional.mp3",
      "original_filename": "sarah_recording.mp3",
      "file_extension": ".mp3",
      "file_size": 1024768,
      "upload_date": "2024-01-15T10:30:00Z",
      "path": "/voices/sarah_professional.mp3"
    }
  ],
  "count": 1
}

Upload Voice

POST /v1/voices

Upload a new voice to the library.

curl -X POST "http://localhost:4123/v1/voices" \
  -F "voice_name=sarah_professional" \
  -F "voice_file=@/path/to/voice.mp3"

Parameters:

  • voice_name (string): Name for the voice (used in API calls)
  • voice_file (file): Audio file (MP3, WAV, FLAC, M4A, OGG, max 10MB)

Delete Voice

DELETE /v1/voices/{voice_name}

Delete a voice from the library.

curl -X DELETE "http://localhost:4123/v1/voices/sarah_professional"

Rename Voice

PUT /v1/voices/{voice_name}

Rename an existing voice.

curl -X PUT "http://localhost:4123/v1/voices/sarah_professional" \
  -F "new_name=sarah_business"

Get Voice Info

GET /v1/voices/{voice_name}

Get detailed information about a specific voice.

curl -X GET "http://localhost:4123/v1/voices/sarah_professional"

Download Voice

GET /v1/voices/{voice_name}/download

Download the original voice file.

curl -X GET "http://localhost:4123/v1/voices/sarah_professional/download" \
  --output voice.mp3

🎤 Using Voices in Speech Generation

JSON API (Recommended)

Use the voice name in the voice parameter:

curl -X POST "http://localhost:4123/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello! This is using my custom voice.",
    "voice": "sarah_professional",
    "exaggeration": 0.7,
    "temperature": 0.8
  }' \
  --output speech.wav

Form Data API

curl -X POST "http://localhost:4123/v1/audio/speech/upload" \
  -F "input=Hello! This is using my custom voice." \
  -F "voice=sarah_professional" \
  -F "exaggeration=0.7" \
  --output speech.wav

Streaming API

curl -X POST "http://localhost:4123/v1/audio/speech/stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This will stream with my custom voice.",
    "voice": "sarah_professional"
  }' \
  --output stream.wav

🔧 Configuration

Environment Variables

# Voice library directory (default: ./voices for local, /voices for Docker)
VOICE_LIBRARY_DIR=/path/to/voices

# For Docker, this is typically set to /voices and mounted as a volume

Docker Configuration

The voice library is automatically configured in Docker with a persistent volume:

volumes:
  - chatterbox-voices:/voices

📝 Voice Naming Guidelines

Valid Characters

  • Letters (a-z, A-Z)
  • Numbers (0-9)
  • Underscores (_)
  • Hyphens (-)
  • Spaces (converted to underscores)

Invalid Characters

  • Forward/backward slashes (/, \)
  • Colons (:)
  • Asterisks (*)
  • Question marks (?)
  • Quotes (", ')
  • Angle brackets (<, >)
  • Pipes (|)

Examples

✅ Good names:
- "sarah_professional"
- "john-voice-2024"
- "female_american"
- "narration_style"

❌ Invalid names:
- "sarah/professional"  # Contains slash
- "voice:sample"        # Contains colon
- "my voice?"           # Contains question mark

🎯 Best Practices

Voice Quality

  • Use high-quality audio samples (16-48kHz sample rate)
  • Aim for 10-30 seconds of clean speech
  • Avoid background noise and music
  • Choose samples with consistent volume

File Management

  • Use descriptive voice names
  • Keep file sizes reasonable (< 10MB)
  • Organize voices by speaker or style
  • Clean up unused voices periodically

API Usage

  • Use the JSON API for better performance
  • Cache voice lists on the client side
  • Handle voice-not-found errors gracefully
  • Test voices before production use

🔍 Troubleshooting

Voice Not Found

{
  "error": {
    "message": "Voice 'my_voice' not found in voice library. Use /voices endpoint to list available voices.",
    "type": "voice_not_found_error"
  }
}

Solution: Check available voices with GET /v1/voices or upload the voice first.

Upload Failed

{
  "error": {
    "message": "Unsupported audio format: .txt. Supported formats: .mp3, .wav, .flac, .m4a, .ogg",
    "type": "invalid_request_error"
  }
}

Solution: Use a supported audio format and ensure the file is valid.

Voice Already Exists

{
  "error": {
    "message": "Voice 'sarah_professional' already exists",
    "type": "voice_exists_error"
  }
}

Solution: Use a different name or delete the existing voice first.

🎛️ Frontend Integration

The web frontend includes a complete voice library management interface:

  • Voice Library Panel: Browse and manage voices
  • Upload Modal: Easy voice upload with drag-and-drop
  • Voice Selection: Choose voices in the TTS interface
  • Preview Playback: Listen to voice samples before use
  • Rename/Delete: Manage voice metadata

📊 Migration from Client-Side Storage

If you were previously using the client-side voice library (localStorage), you'll need to re-upload your voices to the new server-side library for persistence and cross-device access.

🔗 API Aliases

All voice endpoints support multiple URL formats:

  • /v1/voices (recommended)
  • /voices
  • /voice-library
  • /voice_library

🏷️ OpenAI Compatibility

The voice parameter also accepts OpenAI voice names for compatibility:

  • alloy, echo, fable, onyx, nova, shimmer

These will use the default configured voice sample, while custom names will use uploaded voices from the library.

🛡️ Security Considerations

  • Voice files are stored on the server filesystem
  • File uploads are validated for type and size
  • Voice names are sanitized to prevent path traversal
  • No authentication required (same as other endpoints)

📈 Performance Notes

  • Voice library operations are fast (< 100ms typical)
  • Voice files are loaded on-demand for TTS generation
  • Large voice files may increase TTS processing time
  • Consider voice file size vs. quality trade-offs

🆙 Future Enhancements

Planned features for future releases:

  • Voice categorization and tagging
  • Bulk voice operations
  • Voice sharing between users
  • Advanced voice metadata
  • Voice quality analysis
  • Automatic voice optimization