Skip to content

Releases: oobabooga/text-generation-webui

v3.12

02 Sep 19:55
d3a7710
Compare
Choose a tag to compare

Changes

  • Characters can now think in chat-instruct mode! This was possible thanks to many simplifications and improvements to jinja2 template handling:
  • Add support for the Seed-OSS-36B-Instruct template.
  • Better handle the growth of the chat input textarea:
Before After
before after
  • Make the --model flag work with absolute paths for gguf models, like --model /tmp/gemma-3-270m-it-IQ4_NL.gguf
  • Make venv portable installs work with Python 3.13
  • Optimize LaTeX rendering during streaming for long replies
  • Give streaming instruct messages more vertical space
  • Preload the instruct and chat fonts for smoother startup
  • Improve right sidebar borders in light mode
  • Remove the --flash-attn flag (it's always on now in llama.cpp)
  • Suppress "Attempted to select a non-interactive or hidden tab" console warnings, reducing the UI CPU usage during streaming
  • Statically link MSVC runtime to remove the Visual C++ Redistributable dependency on Windows for the llama.cpp binaries
  • Make the llama.cpp terminal output with --verbose less verbose

Bug fixes

  • llama.cpp: Fix stderr deadlock while loading some models
  • llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
  • Fix the UI failing to launch if the Notebook prompt is too long
  • Fix LaTeX rendering for equations with asterisks
  • Fix italic and quote colors in headings

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.11

19 Aug 14:52
cb00db1
Compare
Choose a tag to compare

Changes

  • Add the Tensor Parallelism option to the ExLlamav3/ExLlamav3_HF loaders through the --enable-tp and --tp-backend options.
  • Set multimodal status during Model Loading instead of checking every generation (#7199). Thanks, @altoiddealer.
  • Improve the multimodal API examples slightly.

Bug fixes

  • Make web search functional again
  • mtmd: Fix a bug when "include past attachments" is unchecked
  • Fix code blocks having an extra empty line in the UI

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.10 - Multimodal support!

12 Aug 21:18
6c2fdfd
Compare
Choose a tag to compare

See the Multimodal Tutorial

print6

Changes

  • Add multimodal support to the UI and API
  • Add speculative decoding to the new ExLlamaV3 loader.
  • Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models, since it supports multimodal and speculative decoding.
  • Support loading chat templates from chat_template.json files (EXL3/EXL2/Transformers models)
  • Default max_tokens to 512 in the API instead of 16
  • Better organize the right sidebar in the UI
  • llama.cpp: Pass --swa-full to llama-server when streaming-llm is checked to make it work for models with SWA.

Bug fixes

  • Fix getting the ctx-size for newer EXL3/EXL2/Transformers models
  • Fix the exllamav2 loader ignoring add_bos_token
  • Fix the color of italic text in chat messages
  • Fix edit window and buttons in Messenger theme (#7100). Thanks @mykeehu.

Backend updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.9.1

07 Aug 03:33
88ba4b1
Compare
Choose a tag to compare

Changes


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.9

06 Aug 02:55
fefdb20
Compare
Choose a tag to compare

Experimental GPT-OSS support!

I have obtained some success with the GGUF models under

https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main
https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/tree/main

It may be necessary to re-download those models in the next days if bugs are found, so make sure to recheck those pages.

Changes

  • Add a new Reasoning effort UI element in the chat tab, with low, medium, and high options for GPT-OSS
  • Support standalone .jinja chat templates -- makes it possible to load GPT-OSS through Transformers
  • Make web search functional with thinking models

Bug fixes

  • Fix an edge case in chat history loading that caused a crash (closes #7155)
  • Handle both int and str types in grammar char processing (fixes a rare crash when using grammar)

Backend updates


Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.8

19 Jul 20:54
714f745
Compare
Choose a tag to compare

Changes

  • Replace use_flash_attention_2/use_eager_attention with a unified attn_implementation in the Transformers loader
  • Ignore add_bos_token in instruct prompts, let the jinja2 template decide
  • Add a "None" option for the speculative decoding model

Backend updates


Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.7.1

09 Jul 03:17
6338dc0
Compare
Choose a tag to compare

Changes

  • Chat tab improvements:
    • Move the 'Enable thinking' checkbox from the Parameters tab to the right sidebar
    • Keep the last chat message visible as the input area grows
    • Optimize chat scrolling again (I think that will be the last time—it's really responsive now)
    • Replace 'Generate' with 'Send' in the main button
  • Support installing user extensions in user_data/extensions/ for convenience
  • Small UI optimizations and style improvements
  • Block model and session backend events in --multi-user mode (#7098). Thanks @Alidr79
  • One-click installer: Use miniforge instead of miniconda to avoid Anaconda licensing issues for organizations with 200+ people
  • Standardize margins and paddings across all chat styles (new in 3.7.1)
  • Update the keyboard shortcuts documentation (new in 3.7.1)
  • docs: Add Mirostat Explanation (#7128). Thanks @Cats1337. (new in 3.7.1)

Bug fixes

  • Fix the DuckDuckGo search
  • Fix scrolling during streaming when thinking blocks are present
  • Fix chat history getting lost if the UI is inactive for a long time
  • Fix chat sidebars toggle buttons disappearing (#7106). Thanks @philipp-classen
  • Fix autoscroll after initial fonts loading
  • Handle either missing <think> start or </think> end tags (#7102). Thanks @zombiegreedo
  • Fix custom stopping strings being reset when switching models
  • Fix navigation icons temporarily hiding when switching message versions (new in 3.7.1)
  • Revert "Keep the last chat message visible as the input area grows", as it was very glitchy (new in 3.7.1)

Backend updates


Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.7

07 Jul 21:14
e1034fc
Compare
Choose a tag to compare

v3.6.1

19 Jun 22:49
17f9c18
Compare
Choose a tag to compare

Changes

  • Merge the Default and Notebook tabs into a single Notebook tab (#7078), with an option in the Session tab to switch between one and two columns.
  • Autosave text in the Notebook tab (both generated and manually typed), and add "New" and "Rename" buttons for management.
    • Saved prompts have been moved from user_data/prompts to user_data/logs/notebook, move any existing ones there.
  • Add a new Character tab for character settings.
  • Remember the last selected chat for each chat mode and character.
  • Truncate web search results to at most 8192 tokens to handle edge cases like pages with infinite scrolling.
  • Remove images and links from web search results to reduce noise and focus on the relevant text content.
  • Add an option to exclude attachments from previous messages in the chat prompt. It can be found in the Session tab.
  • Improve the wpp chat style.
  • Increase the size of the enlarged character profile picture that appears when clicking the profile picture.
  • Move 'Custom system message' to the Parameters > Generation tab.
  • Hide the navigation bar on Ctrl+S / Show controls click.
  • Always close/open the two sidebars at the same time when clicking their close buttons on desktop.
  • Only save active extensions and extensions settings on manual settings save.
  • More informative log message when the user input gets truncated.
  • Small style improvements to the chat tab.
  • Optimize scrolling in the chat tab.
  • Optimize syntax highlighting on long conversations.
  • Optimize the token count at the end of generation with llama.cpp.
  • Disable message action icons during streaming for better performance.
  • Expose real model list via /v1/models endpoint (#7088). Thanks @NoxWorld2660
  • Improved API examples in the documentation.
  • Show file sizes in the Model tab on "Get file list" (new in 3.6.1)
  • Force dark theme on the Gradio login page (new in 3.6.1)

Bug fixes

  • Ensure estimated vram is updated when switching between different models (#7071). Thanks @miriameng.
  • Fix an edge case where the gpu-layers slider maximum is incorrectly limited.
  • Add error handling for non-llama.cpp models in portable mode.
  • Fix the character profile picture sometimes not appearing when switching from instruct to chat modes.
  • Fix jittering while typing in the Chat tab on Firefox.
  • Fix the /v1/models output format (new in 3.6.1)
  • Bump numpy to 2.2 to fix loading certain EXL3 models on Windows (new in 3.6.1)
  • Fix obtaining the maximum number of GPU layers for DeepSeek-R1-0528-GGUF (new in 3.6.1)

Backend updates


Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

v3.6

19 Jun 01:48
92547be
Compare
Choose a tag to compare