Feature Request: Add SenseVoice/Paraformer as faster ASR option

## Feature Request

whisper-diarization is excellent for ASR + speaker diarization. Suggesting [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) / [Paraformer](https://github.com/modelscope/FunASR) as alternative ASR backends — they're much faster and FunASR includes its own diarization model.

### Why SenseVoice/Paraformer?

- **5-10x faster than Whisper** — non-autoregressive, dramatically reduces processing time for long recordings
- **Built-in speaker diarization** — FunASR includes cam++ (7.2M params) for speaker embedding, no separate pyannote needed
- **Built-in VAD** — FSMN-VAD with accurate timestamps
- **Built-in punctuation** — automatic punctuation restoration
- **50+ languages** — SenseVoice handles multilingual content

### Complete pipeline comparison

Current: Whisper + pyannote → post-processing alignment
Alternative: FunASR (SenseVoice + FSMN-VAD + cam++ + CT-Punc) → all-in-one

```python
from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
    punc_model="ct-punc",
)
result = model.generate(input="meeting.wav")
# Returns: text + timestamps + speaker labels + punctuation
```

### Speed benefit for long recordings

For a 1-hour recording:
- Whisper large-v3: ~10-15 minutes processing
- SenseVoice: ~2-3 minutes processing (5x faster)

Install: `pip install funasr`

- FunASR: https://github.com/modelscope/FunASR (16.6K stars)
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add SenseVoice/Paraformer as faster ASR option #374

Feature Request

Why SenseVoice/Paraformer?

Complete pipeline comparison

Speed benefit for long recordings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Add SenseVoice/Paraformer as faster ASR option #374

Description

Feature Request

Why SenseVoice/Paraformer?

Complete pipeline comparison

Speed benefit for long recordings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions