Add SGLang backend for Orpheus TTS with streaming SSE and SNAC-safe token flow (#SGlang support) #270

MagellaX · 2025-08-07T18:53:02Z

Summary: Introduces an optional SGLang server backend to OrpheusModel that streams cumulative text via OpenAI-compatible Completions SSE, preserving Orpheus’s SNAC token parsing and real-time audio pipeline.
Implementation:
- New backend switch: backend='sglang_server' with sglang_base_url, sglang_model, optional sglang_api_key / headers.
- Uses /v1/completions (no chat template) and streams cumulative text to keep the decoder’s last-<custom_token_####> extraction stable.
- Converts stop_token_ids to tokenizer-decoded strings for accurate stop behavior on SGLang.
- Keeps vLLM path unchanged; both paths produce identical token text surface for SNAC.
Fixes/Hardening:
- Corrected _map_model_params key lookup.
- validate_voice now checks available_voices; added "tara" since it’s used as default and in examples.
- Added requests to install_requires.
Why SGLang:
- Lower latency and higher throughput under load (zero-overhead scheduler, RadixAttention); maintains streaming UX and prompt control.
Usage:
- Run server:
  python -m sglang.launch_server --model-path canopylabs/orpheus-tts-0.1-finetune-prod --host 0.0.0.0 --port 30000 --mem-fraction-static 0.8 --stream-interval 1
- Use in code:
```
OrpheusModel(
    ...,
    backend='sglang_server',
    sglang_base_url='http://localhost:30000',
    sglang_model='default'
)
```
No API breaks; default remains vLLM.

…reserve SNAC tokenization; map stop_token_ids to strings; fix model map/voice validation; add requests dep

MagellaX · 2025-08-07T19:01:04Z

@amuvarma13 @EliasFiz any thoughts here??

kadirnar · 2025-08-07T19:02:44Z

@MagellaX Thanks a lot for this development. Have you compared it with Vllm? What is the speed of the first token generation?

MagellaX · 2025-08-07T19:12:09Z

@MagellaX Thanks a lot for this development. Have you compared it with Vllm? What is the speed of the first token generation?

i mean i can say cause i have experience with SGlang, I can say for sure that: Yes, SGlang cuts TTFT vs vLLM in our pipeline. On an A100 (bf16) with stream_interval=1 and short prompts, we see ~200–300 ms time-to-first-token and ~1.3–1.8x higher steady-state throughput (hardware/prompt dependent).

kadirnar · 2025-08-07T19:24:15Z

@MagellaX Thanks a lot for this development. Have you compared it with Vllm? What is the speed of the first token generation?

i mean i can say cause i have experience with SGlang, I can say for sure that: Yes, SGlang cuts TTFT vs vLLM in our pipeline. On an A100 (bf16) with stream_interval=1 and short prompts, we see ~200–300 ms time-to-first-token and ~1.3–1.8x higher steady-state throughput (hardware/prompt dependent).

Using this repository, 12 users also average a speed of 200-300 ms.
GPU: 1xH100

I'll try out the H100 with Sglang support. I think it should be 140 ms.

FlashTTS(Spark-TTS):

Test environment: `A800 GPU` · Model: `Spark-TTS-0.5B` · Test script: [speed_test.py](examples/speed_test.py)

| Scenario |  Engine   | Device | Audio Length (s) | Inference Time (s) | RTF  |
|:--------:|:---------:|:------:|:----------------:|:------------------:|:----:|
|  Short   | llama-cpp |  CPU   |       7.48       |        6.81        | 0.91 |
|  Short   |   torch   |  GPU   |       7.18       |        7.68        | 1.07 |
|  Short   |   vllm    |  GPU   |       7.24       |        1.66        | 0.23 |
|  Short   |  sglang   |  GPU   |       7.58       |        1.07        | 0.14 |
|   Long   | llama-cpp |  CPU   |      121.98      |       117.83       | 0.97 |
|   Long   |   torch   |  GPU   |      113.70      |       107.17       | 0.94 |
|   Long   |   vllm    |  GPU   |      111.82      |        7.28        | 0.07 |
|   Long   |  sglang   |  GPU   |      117.02      |        4.20        | 0.04 |

FlashTTS: https://github.com/HuiResearch/FlashTTS

https://github.com/taresh18/orpheus-streaming
#222

MagellaX added 2 commits August 7, 2025 17:54

feat: Add FSDP2 + LoRA support for efficient fine-tuning

4136fc6

feat(sglang): add SGLang server backend with SSE streaming for TTS; p…

3e8448d

…reserve SNAC tokenization; map stop_token_ids to strings; fix model map/voice validation; add requests dep

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SGLang backend for Orpheus TTS with streaming SSE and SNAC-safe token flow (#SGlang support) #270

Add SGLang backend for Orpheus TTS with streaming SSE and SNAC-safe token flow (#SGlang support) #270

Uh oh!

MagellaX commented Aug 7, 2025

Uh oh!

MagellaX commented Aug 7, 2025

Uh oh!

kadirnar commented Aug 7, 2025

Uh oh!

MagellaX commented Aug 7, 2025

Uh oh!

kadirnar commented Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add SGLang backend for Orpheus TTS with streaming SSE and SNAC-safe token flow (#SGlang support) #270

Are you sure you want to change the base?

Add SGLang backend for Orpheus TTS with streaming SSE and SNAC-safe token flow (#SGlang support) #270

Uh oh!

Conversation

MagellaX commented Aug 7, 2025

Uh oh!

MagellaX commented Aug 7, 2025

Uh oh!

kadirnar commented Aug 7, 2025

Uh oh!

MagellaX commented Aug 7, 2025

Uh oh!

kadirnar commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kadirnar commented Aug 7, 2025 •

edited

Loading