fix(transcription): load PCM WAV via stdlib to skip PyAV decode#34
Conversation
Decode 16-bit PCM WAV (mono/stereo) with the wave module, resample to the model sample rate, and pass float32 audio to faster-whisper. This bypasses PyAV/FFmpeg for typical recordings and avoids UnicodeDecodeError on some locales when error paths contain non-ASCII text. Non-WAV and unsupported WAV formats still use the file path and the existing PyAV pipeline.
|
Important Installation incomplete: to start using Gemini Code Assist, please ask the organization owner(s) to visit the Gemini Code Assist Admin Console and sign the Terms of Services. |
|
Can you confirm that you're real? I went to the github account and it's barren. |
|
Hi, yes, I'm real. |
|
Congratulations on your first contribution to this project! Great catch on the PyAV locale issue — clean implementation with a solid fallback. Merged! |
|
Thank you very much, I'm glad to hear that! |
Problem - on Linux (CachyOS)
Transcription after recording can fail with
UnicodeDecodeError(e.g.'ascii' codec can't decode byte …) inav.error.err_checkwhen PyAV decodes FFmpeg error paths as ASCII. This shows up on some locales (e.g. German) when FFmpeg/PyAV surfaces non-ASCII text.Cause
faster-whisper decodes and resamples via PyAV; FFmpeg error handling can expose non-ASCII strings, and PyAV may decode them incorrectly. In-app recordings are standard PCM WAV.
Solution
For supported
.wavfiles (16-bit PCM, mono or stereo), load audio with the stdlibwavemodule and NumPy, optionally downmix stereo to mono, linearly resample tofeature_extractor.sampling_rate (16 kHz), and pass afloat32np.ndarrayintotranscribe/BatchedInferencePipeline.transcribeso faster-whisper skipsdecode_audio. Unsupported WAV variants and all non-WAV inputs keep the previous behavior (file path string → PyAV pipeline unchanged).Testing
.wavpath still yieldsNone→ string path unchanged.Prepared with assistance from Cursor (AI editor). I have tested the changes locally, please still give the patch a careful review before merging.