Releases: BBC-Esq/Faster-Whisper-Transcriber
Releases · BBC-Esq/Faster-Whisper-Transcriber
v9.0.0
Release Notes 🚀
✨ New: Server Mode (HTTP API)
- 🌐 Toggle an HTTP API on from Settings → Server Mode (default port
8765, configurable 1024–65535) — binds to0.0.0.0so it's reachable from other machines on your network - 📡 FastAPI/uvicorn backend exposes
/transcribe(multipart upload) and/transcribe/raw(base64 JSON), plus/health,/status, and/models— interactive Swagger docs at/docs - 🎛️ Accepts audio files, NumPy arrays (
.npy), PyTorch tensors (.pt), raw PCM bytes, and base64-encoded payloads — auto-resampled to 16 kHz mono with format auto-detection - ⚙️ Per-request overrides for model, quantization, device, language, task, beam size, VAD, batch size, and timestamps; falls back to GUI defaults when omitted
- 🔄 Single-worker queue serializes GPU access; concurrent clients are queued and served in order without polling
- 🧠 New
get_or_load_model_sync()path on the model manager lets the server hot-swap models per request while reusing the cached instance when settings match
📖 Documentation & UX
- 📘 Bundled
SERVER_API_GUIDE.html— a dark-themed, fully self-contained guide covering every endpoint, all five input formats, response schemas, error codes, and a complete end-to-end example - 🤖 The guide includes a "Copy as Text (for LLMs)" button that dumps a clean plain-text version to your clipboard — drop it straight into Claude/ChatGPT/etc. for help integrating
- 🔘 New Guide button in Settings opens the HTML in your default browser
- 🔒 While Server Mode is on, the GUI recorder and most settings are locked to prevent state conflicts; an attempt to enable it during an active job is blocked with a clear warning
- 📊 The main window status bar now shows server state inline (e.g.,
large-v3 (float16 / cuda) | server: on (8765))
🛠️ Other Changes
- 🕒 Collapsed
without_timestamps/word_timestampssettings into a single Include Timestamps checkbox; SRT/VTT outputs auto-force timestamps on regardless - 🎯 Tuned VAD parameters now applied consistently across single, batch, and server paths (
threshold=0.0008,speech_pad_ms=500, etc.) —vad_filteris forced on wheneverbatch_size > 1 - 🗑️ Dropped TSV output format; added
.oggto the supported extensions list - 📦 Upgraded to
ctranslate2==4.7.1andfaster-whisper2==2.1.0; dropped the bundlednvidia-cudnn-cu12dependency (now resolved transitively) - 🐛 Fixes: batch panel now shows "Completed with errors" instead of overwriting with success, file-panel parent reference bug resolved, config files read/written as UTF-8, error dialog properly resets the file panel
- 🧹 Cleaner shutdown via
sys.exit()instead ofos._exit(), plus a dedicatedserver_manager.cleanup()step in the close sequence
v8.0.1 - small bug fix
What's Changed
- fix(transcription): load PCM WAV via stdlib to skip PyAV decode by @webenefits in #34
New Contributors
- @webenefits made their first contribution in #34
Full Changelog: v8.0.0...v8.0.1
v8.0.0 - new and improved
File Transcription Panel 📂
- A new dockable File Transcription panel lets you transcribe individual files or entire directories of audio/video files. Toggle between single and multi-file mode, scan directories recursively, and filter by file type. Output to clipboard, the source directory, or a custom folder in txt, srt, vtt, tsv, or json format.
System Monitoring 📊
- Real-time CPU and RAM metrics are now displayed at the bottom of the main window.
Other Improvements ✨
- Consistent GUI behavior and visualizations.
- Fixed inconsistent default values for whisper parameters across pipelines
- Removed incorrect ffmpeg requirement from README
nvidia-ml-pyis now automatically installed for GPU users to enable system monitoring
v7.6.0 - more control
✨ New Features
- Expose adjustable parameters within
faster-whisperlibrary for more control.
v7.5.0 - choose audio device
✨ New Features
- Users can now choose a microphone/device from the Settings dialog.
- Falls back to system default if the previously saved device is unavailable.
🎨 UI Improvements
- Replaced text buttons with compact icon buttons
⚙️ Improvements
- Better shutdown behavior
- Updated model download signals to use flexible types for improved progress reporting.
v7.4.0 - robust model downloads
🧠 Smarter Model Handling
- Validates model files before loading to prevent corrupted cache issues
- Automatically clears broken Hugging Face cache and re-downloads cleanly
- Now correctly persists across uninstalls/re-installs
🎙 Recording & Transcription Stability
- Safer recording toggle logic (prevents invalid states)
- Improved cancellation and cleanup handling
🖥 UI & Thread Safety Enhancements
- Thread-safe model access with proper unloading to reduce memory leaks
- Improved download status display and cancel behavior
- Cleaner shutdown and signal disconnection handling
v7.3.0 - simplify
✨ Improvements
- Added deferred config flushing with a 500ms debounce to reduce excessive disk writes and improve performance.
- Introduced in-memory dirty tracking with safe synchronous flush on shutdown.
🧹 Simplifications
- Removed NLTK dependency from text curation; replaced with lightweight whitespace normalization.
- Minor internal refactors for safer cache handling and deep copies.
🔒 Reliability
- Ensure config is flushed synchronously on app close to prevent data loss.
v7.2.0 - broader compatibility
🛠️ Improved Stability in GUI Mode
- Prevented crashes when running via
pythonw.exeby ensuringsys.stdoutandsys.stderrare always valid. - Disabled Hugging Face progress bars in GUI mode to avoid
tqdmwrite errors. - Added extra stream validation before model downloads and snapshot fallbacks to eliminate
NoneTypewrite crashes.
🎯 Smarter Quantization Handling
- Quantization options are now filtered against actual hardware capabilities, preventing unsupported types (e.g.,
bfloat16on older GPUs) from appearing. - Always re-detects supported quantizations on startup to handle configs copied from different machines.
- Ensures
float32is always available on CPU as a safe fallback. - Simplified and localized quantization override definitions in model metadata.
⚡ Cleaner CUDA Detection
- Refactored CUDA checks for clearer separation of availability detection and device name retrieval.
- More reliable logging of CUDA availability and GPU detection.
Overall, this update improves robustness across different environments and ensures users only see valid, hardware-supported precision options.
v7.1.0 - Smarter Model Management
🚀 Smarter Model Downloads & Critical Bug Fixes
This release overhauls how Whisper models are downloaded, cached, and loaded — with real-time feedback, full offline support, and fixes for several UI and platform-specific bugs.
📊 Download Progress & Control
- Real-time progress bar — See downloaded/total bytes in the status bar as models are fetched from Hugging Face. No more staring at a frozen-looking app.
- ⏹️ Cancel anytime — A cancel button appears during downloads so you can abort without killing the app.
- 🔄 Smart resume — Cancelled or failed downloads pick up where they left off, fetching only the missing files.
🌐 Offline Support
- ⚡ Zero network calls for cached models — Previously downloaded models load instantly with no internet dependency whatsoever.
- 💬 Clear error messages — Attempting to download a new model without internet gives a descriptive, actionable error instead of a cryptic exception.
🐛 Bug Fixes
- 🔒 Startup race condition fixed — All widgets were previously clickable before the initial model finished loading, which could trigger a confusing "No model is loaded" error. Widgets now stay properly disabled until a model is ready.
- 🏷️ Dedicated model status bar — The bottom status bar now exclusively shows Whisper model state (loaded model, download progress, or "No model loaded") and no longer displays unrelated messages.
🪟 Windows Compatibility
- 🔗 Windows symlink/reparse point fix — Resolved
WinError 448("untrusted mount point") crash that occurred when the HuggingFace cache used symlinks on Windows systems without Developer Mode enabled. The app now resolves cache paths through metadata instead of filesystem traversal. - 🛡️ Robust download fallback — If per-file downloading fails for any reason, the app automatically falls back to HuggingFace's built-in snapshot downloader to ensure models are fetched successfully.
v7.0.0 - new look, more robust
What's New 🎉
- Sleek new UI — slimmed down the main window and tucked model settings into a proper Settings dialog. Less clutter, more focus.
- Live waveform visualizer 🎵 — the record button now pulses and animates in real-time as you speak, with a cool particle effect during transcription.
- Append mode checkbox moved into the clipboard window, right where it belongs.
- Cleaner transcriptions — squashed whitespace quirks and a sneaky silent failure when swapping models mid-transcription.
- Transcription results now flow through the clipboard window instead of hijacking your system clipboard uninvited.
- More stable under the hood — smoother shutdowns, thread safety fixes, and stale model loads get tossed in the bin where they belong.
- Faster installs — the installer now does one clean pass instead of installing libraries one-by-one like it's 2005.