Skip to content

feat: add thinking mode toggle for local Qwen models#513

Open
jalvarezz13 wants to merge 1 commit intoOpenWhispr:mainfrom
jalvarezz13:feat/local-thinking-toggle
Open

feat: add thinking mode toggle for local Qwen models#513
jalvarezz13 wants to merge 1 commit intoOpenWhispr:mainfrom
jalvarezz13:feat/local-thinking-toggle

Conversation

@jalvarezz13
Copy link
Copy Markdown

Summary

  • Adds a UI toggle to enable/disable thinking mode for local Qwen3/3.5 models
  • When disabled, appends /no_think to user messages so the model skips the <think> reasoning phase
  • Significantly improves inference speed on consumer hardware by avoiding unnecessary reasoning token generation
  • Toggle only appears when a model with thinking support is selected

Changes

  • Model registry: Added supportsThinking: true flag to all Qwen3 and Qwen3.5 local models
  • Settings store: New localThinkingEnabled boolean setting (default: false) persisted in localStorage
  • Inference pipeline: When thinking is disabled, /no_think is appended to user messages in both non-streaming (modelManagerBridge.js) and streaming (ReasoningService.ts) paths
  • UI: Compact toggle row in the local model section of ReasoningModelSelector, visible only when a thinking-capable model is selected
  • i18n: Translations added for all 10 supported languages

How it works

Qwen3/3.5 models support hybrid thinking via the /no_think message suffix. When the user disables thinking mode:

  1. Non-streaming: modelManagerBridge.runInference() appends /no_think to the user message before sending to llama.cpp
  2. Streaming: ReasoningService.processTextStreaming() appends /no_think to the last user message
  3. The existing <think> block stripping in localReasoningBridge.js and ReasoningService.ts remains as a safety net

Testing

  • npm run build passes
  • No new LSP diagnostics introduced
  • Tested locally with Electron app running

Closes #512

Allow users to enable/disable thinking mode for local Qwen3/3.5 models
via a UI toggle. When disabled, appends /no_think to user messages so
the model skips the <think> reasoning phase, significantly improving
inference speed on consumer hardware.

Closes OpenWhispr#512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Disable thinking mode for local Qwen3/3.5 to improve inference speed

1 participant