Release v3.5 · oobabooga/text-generation-webui

Changes

Optimize chat streaming by only updating the last message during streaming and adding back the dynamic UI update speed. These changes make streaming smooth even at 100k tokens context.
Add a CUDA 12.8 installation option for RTX 50XX NVIDIA Blackwell support (ExLlamaV2/V3 and Transformers) (#7011). Thanks @okazaki10
Make UI settings persistent. Any value you change, including sliders in the Parameters tab, chat mode, character, character description fields, etc, now gets automatically saved to user_data/settings.yaml. If you close the UI and launch it again, the values will be where you left them. The Model tab is left as an exception since it's managed by command-line flags and its own "Save settings" menu.
Make the dark theme darker and more aesthetic.
Add support for .docx attachments.
Add 🗑️ buttons for easily deleting individual past chats.
Add new buttons: "Restore preset", "Neutralize samplers", "Restore character".
Reorganize the Parameters tab with parameters that get saved to presets on the left and everything else on the right.
Add Qwen3 presets (Thinking and No Thinking), and make Qwen3 - Thinking the new default preset. If you update a portable install manually by moving user_data, you will not have these files; download them from here if you are interested.
Add the model name to each message's metadata, and show it in the UI when hovering the date/time for a message.
Scroll up automatically to show the whole editing area when editing a message.
Add an option to turn long pasted text into an attachment automatically. This is disabled by default and can be enabled in the Session tab.
Extract the text of web searches with formatting instead of putting all text on a single line.
Show llama.cpp prompt processing progress on a single line.
Add informative tooltips when hovering the file upload icon and the web search checkbox.
Several small UI optimizations.
Several small UI style improvements.
Use user_data/cache/gradio for Gradio temorary files instead of the system's temporary folder.

Bug fixes

Filter out failed web search downloads from attachments.
Remove quotes from LLM-generated web search queries.
Fix the progress bar for downloading a model not appearing in the UI.
Fix the text for a sent message reappearing in the input area when the page is reloaded.
Fix selecting the next chat on the list when deleting a chat with an active search.
Fix light/dark theme persistence across page reloads.
Re-highlight code blocks when switching light/dark themes to fix styling issues.
Stop llama.cpp model during graceful shutdown to avoid an error message (#7042). Thanks @leszekhanusz
Check .attention.head_count if .attention.head_count_kv doesn't exist for VRAM calculation (#7048). Thanks @miriameng
Fix failure when --nowebui is called without --api (#7055). Thanks @miriameng
Fix continue/start reply with when using translation extensions (#6944). Thanks @mykeehu
Load js and css sources in UTF-8 (#7059). Thanks @LawnMauer

Backend updates

Bump llama.cpp to ggml-org/llama.cpp@2bb0467
Bump ExLlamaV3 to 0.0.3
Bump ExLlamaV2 to 0.3.1

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.5

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!