Skip to content

Releases: BBC-Esq/Faster-Whisper-Transcriber

v9.0.0

28 Apr 00:18

Choose a tag to compare

Release Notes 🚀

✨ New: Server Mode (HTTP API)

  • 🌐 Toggle an HTTP API on from Settings → Server Mode (default port 8765, configurable 1024–65535) — binds to 0.0.0.0 so it's reachable from other machines on your network
  • 📡 FastAPI/uvicorn backend exposes /transcribe (multipart upload) and /transcribe/raw (base64 JSON), plus /health, /status, and /models — interactive Swagger docs at /docs
  • 🎛️ Accepts audio files, NumPy arrays (.npy), PyTorch tensors (.pt), raw PCM bytes, and base64-encoded payloads — auto-resampled to 16 kHz mono with format auto-detection
  • ⚙️ Per-request overrides for model, quantization, device, language, task, beam size, VAD, batch size, and timestamps; falls back to GUI defaults when omitted
  • 🔄 Single-worker queue serializes GPU access; concurrent clients are queued and served in order without polling
  • 🧠 New get_or_load_model_sync() path on the model manager lets the server hot-swap models per request while reusing the cached instance when settings match

📖 Documentation & UX

  • 📘 Bundled SERVER_API_GUIDE.html — a dark-themed, fully self-contained guide covering every endpoint, all five input formats, response schemas, error codes, and a complete end-to-end example
  • 🤖 The guide includes a "Copy as Text (for LLMs)" button that dumps a clean plain-text version to your clipboard — drop it straight into Claude/ChatGPT/etc. for help integrating
  • 🔘 New Guide button in Settings opens the HTML in your default browser
  • 🔒 While Server Mode is on, the GUI recorder and most settings are locked to prevent state conflicts; an attempt to enable it during an active job is blocked with a clear warning
  • 📊 The main window status bar now shows server state inline (e.g., large-v3 (float16 / cuda) | server: on (8765))

🛠️ Other Changes

  • 🕒 Collapsed without_timestamps / word_timestamps settings into a single Include Timestamps checkbox; SRT/VTT outputs auto-force timestamps on regardless
  • 🎯 Tuned VAD parameters now applied consistently across single, batch, and server paths (threshold=0.0008, speech_pad_ms=500, etc.) — vad_filter is forced on whenever batch_size > 1
  • 🗑️ Dropped TSV output format; added .ogg to the supported extensions list
  • 📦 Upgraded to ctranslate2==4.7.1 and faster-whisper2==2.1.0; dropped the bundled nvidia-cudnn-cu12 dependency (now resolved transitively)
  • 🐛 Fixes: batch panel now shows "Completed with errors" instead of overwriting with success, file-panel parent reference bug resolved, config files read/written as UTF-8, error dialog properly resets the file panel
  • 🧹 Cleaner shutdown via sys.exit() instead of os._exit(), plus a dedicated server_manager.cleanup() step in the close sequence

v8.0.1 - small bug fix

02 Apr 12:49
13ca0b1

Choose a tag to compare

What's Changed

  • fix(transcription): load PCM WAV via stdlib to skip PyAV decode by @webenefits in #34

New Contributors

Full Changelog: v8.0.0...v8.0.1

v8.0.0 - new and improved

23 Mar 16:38

Choose a tag to compare

File Transcription Panel 📂

  • A new dockable File Transcription panel lets you transcribe individual files or entire directories of audio/video files. Toggle between single and multi-file mode, scan directories recursively, and filter by file type. Output to clipboard, the source directory, or a custom folder in txt, srt, vtt, tsv, or json format.

System Monitoring 📊

  • Real-time CPU and RAM metrics are now displayed at the bottom of the main window.

Other Improvements ✨

  • Consistent GUI behavior and visualizations.
  • Fixed inconsistent default values for whisper parameters across pipelines
  • Removed incorrect ffmpeg requirement from README
  • nvidia-ml-py is now automatically installed for GPU users to enable system monitoring

v7.6.0 - more control

10 Mar 13:04

Choose a tag to compare

✨ New Features

  • Expose adjustable parameters within faster-whisper library for more control.

v7.5.0 - choose audio device

09 Mar 22:50

Choose a tag to compare

✨ New Features

  • Users can now choose a microphone/device from the Settings dialog.
  • Falls back to system default if the previously saved device is unavailable.

🎨 UI Improvements

  • Replaced text buttons with compact icon buttons

⚙️ Improvements

  • Better shutdown behavior
  • Updated model download signals to use flexible types for improved progress reporting.

v7.4.0 - robust model downloads

28 Feb 14:22

Choose a tag to compare

🧠 Smarter Model Handling

  • Validates model files before loading to prevent corrupted cache issues
  • Automatically clears broken Hugging Face cache and re-downloads cleanly
  • Now correctly persists across uninstalls/re-installs

🎙 Recording & Transcription Stability

  • Safer recording toggle logic (prevents invalid states)
  • Improved cancellation and cleanup handling

🖥 UI & Thread Safety Enhancements

  • Thread-safe model access with proper unloading to reduce memory leaks
  • Improved download status display and cancel behavior
  • Cleaner shutdown and signal disconnection handling

v7.3.0 - simplify

20 Feb 16:48

Choose a tag to compare

✨ Improvements

  • Added deferred config flushing with a 500ms debounce to reduce excessive disk writes and improve performance.
  • Introduced in-memory dirty tracking with safe synchronous flush on shutdown.

🧹 Simplifications

  • Removed NLTK dependency from text curation; replaced with lightweight whitespace normalization.
  • Minor internal refactors for safer cache handling and deep copies.

🔒 Reliability

  • Ensure config is flushed synchronously on app close to prevent data loss.

v7.2.0 - broader compatibility

17 Feb 13:41

Choose a tag to compare

🛠️ Improved Stability in GUI Mode

  • Prevented crashes when running via pythonw.exe by ensuring sys.stdout and sys.stderr are always valid.
  • Disabled Hugging Face progress bars in GUI mode to avoid tqdm write errors.
  • Added extra stream validation before model downloads and snapshot fallbacks to eliminate NoneType write crashes.

🎯 Smarter Quantization Handling

  • Quantization options are now filtered against actual hardware capabilities, preventing unsupported types (e.g., bfloat16 on older GPUs) from appearing.
  • Always re-detects supported quantizations on startup to handle configs copied from different machines.
  • Ensures float32 is always available on CPU as a safe fallback.
  • Simplified and localized quantization override definitions in model metadata.

⚡ Cleaner CUDA Detection

  • Refactored CUDA checks for clearer separation of availability detection and device name retrieval.
  • More reliable logging of CUDA availability and GPU detection.

Overall, this update improves robustness across different environments and ensures users only see valid, hardware-supported precision options.

v7.1.0 - Smarter Model Management

12 Feb 19:57

Choose a tag to compare

🚀 Smarter Model Downloads & Critical Bug Fixes

This release overhauls how Whisper models are downloaded, cached, and loaded — with real-time feedback, full offline support, and fixes for several UI and platform-specific bugs.


📊 Download Progress & Control

  • Real-time progress bar — See downloaded/total bytes in the status bar as models are fetched from Hugging Face. No more staring at a frozen-looking app.
  • ⏹️ Cancel anytime — A cancel button appears during downloads so you can abort without killing the app.
  • 🔄 Smart resume — Cancelled or failed downloads pick up where they left off, fetching only the missing files.

🌐 Offline Support

  • ⚡ Zero network calls for cached models — Previously downloaded models load instantly with no internet dependency whatsoever.
  • 💬 Clear error messages — Attempting to download a new model without internet gives a descriptive, actionable error instead of a cryptic exception.

🐛 Bug Fixes

  • 🔒 Startup race condition fixed — All widgets were previously clickable before the initial model finished loading, which could trigger a confusing "No model is loaded" error. Widgets now stay properly disabled until a model is ready.
  • 🏷️ Dedicated model status bar — The bottom status bar now exclusively shows Whisper model state (loaded model, download progress, or "No model loaded") and no longer displays unrelated messages.

🪟 Windows Compatibility

  • 🔗 Windows symlink/reparse point fix — Resolved WinError 448 ("untrusted mount point") crash that occurred when the HuggingFace cache used symlinks on Windows systems without Developer Mode enabled. The app now resolves cache paths through metadata instead of filesystem traversal.
  • 🛡️ Robust download fallback — If per-file downloading fails for any reason, the app automatically falls back to HuggingFace's built-in snapshot downloader to ensure models are fetched successfully.

v7.0.0 - new look, more robust

10 Feb 20:36

Choose a tag to compare

What's New 🎉

  • Sleek new UI — slimmed down the main window and tucked model settings into a proper Settings dialog. Less clutter, more focus.
  • Live waveform visualizer 🎵 — the record button now pulses and animates in real-time as you speak, with a cool particle effect during transcription.
  • Append mode checkbox moved into the clipboard window, right where it belongs.
  • Cleaner transcriptions — squashed whitespace quirks and a sneaky silent failure when swapping models mid-transcription.
  • Transcription results now flow through the clipboard window instead of hijacking your system clipboard uninvited.
  • More stable under the hood — smoother shutdowns, thread safety fixes, and stale model loads get tossed in the bin where they belong.
  • Faster installs — the installer now does one clean pass instead of installing libraries one-by-one like it's 2005.