-
Notifications
You must be signed in to change notification settings - Fork 1
Speech gate #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speech gate #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a "speech gate" to address Azure speech detection sensitivity issues by filtering out low-volume audio, noise, and echo detection. The gate uses configurable parameters to dampen unwanted audio while preserving normal speech.
Key changes:
- Adds a new speech gate module with RMS-based filtering algorithm
- Integrates the speech gate into Azure transcription service
- Includes a standalone test utility for audio processing validation
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
core/src/speech_gate.rs | Implements the main speech gate algorithm with multiple variants (hard/soft/RMS-based) |
core/src/lib.rs | Exports the speech gate module |
src/lib.rs | Exports the speech gate processor function |
services/azure/src/transcribe.rs | Integrates speech gate into Azure transcription pipeline |
filter-test/ | Adds a standalone CLI tool for testing speech gate on audio files |
Cargo.toml | Adds fundsp dependency and filter-test workspace member |
Comments suppressed due to low confidence (1)
filter-test/Cargo.toml:4
- The Rust edition "2024" does not exist. The latest stable edition is "2021". Change this to "2021".
edition = "2024"
Co-authored-by: Copilot <[email protected]>
Azure speech detection seems to be very sensitive. It detects very faint speech which makes it to also detect echos from speakers, for example when speech detection is active and someone else speaks on the other line.
This PR adds a speech gate that is loosely configured to dampen low volume audio, noise and also whispering sounds. The algorithm was mostly done by Claude Sonnet 3.7 and the parameterization was derived from iterating on two samples (normal speech and echo speech).