02 Sep 19:55

oobabooga

d3a7710

v3.12 Latest

Latest

Changes

Characters can now think in chat-instruct mode! This was possible thanks to many simplifications and improvements to jinja2 template handling:

Add support for the Seed-OSS-36B-Instruct template.
Better handle the growth of the chat input textarea:

Before	After

Make the --model flag work with absolute paths for gguf models, like --model /tmp/gemma-3-270m-it-IQ4_NL.gguf
Make venv portable installs work with Python 3.13
Optimize LaTeX rendering during streaming for long replies
Give streaming instruct messages more vertical space
Preload the instruct and chat fonts for smoother startup
Improve right sidebar borders in light mode
Remove the --flash-attn flag (it's always on now in llama.cpp)
Suppress "Attempted to select a non-interactive or hidden tab" console warnings, reducing the UI CPU usage during streaming
Statically link MSVC runtime to remove the Visual C++ Redistributable dependency on Windows for the llama.cpp binaries
Make the llama.cpp terminal output with --verbose less verbose

Bug fixes

llama.cpp: Fix stderr deadlock while loading some models
llama.cpp: Fix obtaining the maximum sequence length for GPT-OSS
Fix the UI failing to launch if the Notebook prompt is too long
Fix LaTeX rendering for equations with asterisks
Fix italic and quote colors in headings

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/9961d244f2df6baf40af2f1ddc0927f8d91578c8

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 12

19 Aug 14:52

oobabooga

v3.11

cb00db1

v3.11

Changes

Add the Tensor Parallelism option to the ExLlamav3/ExLlamav3_HF loaders through the --enable-tp and --tp-backend options.
Set multimodal status during Model Loading instead of checking every generation (#7199). Thanks, @altoiddealer.
Improve the multimodal API examples slightly.

Bug fixes

Make web search functional again
mtmd: Fix a bug when "include past attachments" is unchecked
Fix code blocks having an extra empty line in the UI

Backend updates

Update llama.cpp to ggml-org/llama.cpp@6d7f111
Update ExLlamaV3 to 0.0.6
Update flash-attention to 2.8.3

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

altoiddealer

Assets 12

12 Aug 21:18

oobabooga

v3.10

6c2fdfd

v3.10 - Multimodal support!

See the Multimodal Tutorial

Changes

Add multimodal support to the UI and API
- With the llama.cpp loader (#7027). This was possible thanks to PR ggml-org/llama.cpp#15108 to llama.cpp. Thanks @65a.
- With ExLlamaV3 through a new ExLlamaV3 loader (#7174). Thanks @Katehuuh.
Add speculative decoding to the new ExLlamaV3 loader.
Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models, since it supports multimodal and speculative decoding.
Support loading chat templates from chat_template.json files (EXL3/EXL2/Transformers models)
Default max_tokens to 512 in the API instead of 16
Better organize the right sidebar in the UI
llama.cpp: Pass --swa-full to llama-server when streaming-llm is checked to make it work for models with SWA.

Bug fixes

Fix getting the ctx-size for newer EXL3/EXL2/Transformers models
Fix the exllamav2 loader ignoring add_bos_token
Fix the color of italic text in chat messages
Fix edit window and buttons in Messenger theme (#7100). Thanks @mykeehu.

Backend updates

Bump llama.cpp to ggml-org/llama.cpp@f4586ee

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

mykeehu, 65a, and Katehuuh

Assets 12

07 Aug 03:33

oobabooga

v3.9.1

88ba4b1

v3.9.1

Changes

Several improvements to the GPT-OSS template handling. Special actions like "Continue" and "Impersonate" now work correctly.
Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/5fd160bbd9d70b94b5b11b0001fd7f477005e4a0

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 12

06 Aug 02:55

oobabooga

v3.9

fefdb20

v3.9

Experimental GPT-OSS support!

I have obtained some success with the GGUF models under

https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main
https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/tree/main

It may be necessary to re-download those models in the next days if bugs are found, so make sure to recheck those pages.

Changes

Add a new Reasoning effort UI element in the chat tab, with low, medium, and high options for GPT-OSS
Support standalone .jinja chat templates -- makes it possible to load GPT-OSS through Transformers
Make web search functional with thinking models

Bug fixes

Fix an edge case in chat history loading that caused a crash (closes #7155)
Handle both int and str types in grammar char processing (fixes a rare crash when using grammar)

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/fd1234cb468935ea087d6929b2487926c3afff4b
Update Transformers to 4.55 (adds GPT-OSS support)

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 12

19 Jul 20:54

oobabooga

v3.8

714f745

v3.8

Changes

Replace use_flash_attention_2/use_eager_attention with a unified attn_implementation in the Transformers loader
Ignore add_bos_token in instruct prompts, let the jinja2 template decide
Add a "None" option for the speculative decoding model

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/90083283ec254fa8d33897746dea229aee401b37
Update Transformers to 4.53
- Also update bitsandbytes/Accelerate/PEFT to the latest versions
Update ExLlamaV3 to 0.0.5
Update ExLlamaV2 to 0.3.2

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Assets 12

09 Jul 03:17

oobabooga

v3.7.1

6338dc0

v3.7.1

Changes

Chat tab improvements:
- Move the 'Enable thinking' checkbox from the Parameters tab to the right sidebar
- Keep the last chat message visible as the input area grows
- Optimize chat scrolling again (I think that will be the last time—it's really responsive now)
- Replace 'Generate' with 'Send' in the main button
Support installing user extensions in user_data/extensions/ for convenience
Small UI optimizations and style improvements
Block model and session backend events in --multi-user mode (#7098). Thanks @Alidr79
One-click installer: Use miniforge instead of miniconda to avoid Anaconda licensing issues for organizations with 200+ people
Standardize margins and paddings across all chat styles (new in 3.7.1)
Update the keyboard shortcuts documentation (new in 3.7.1)
docs: Add Mirostat Explanation (#7128). Thanks @Cats1337. (new in 3.7.1)

Bug fixes

Fix the DuckDuckGo search
Fix scrolling during streaming when thinking blocks are present
Fix chat history getting lost if the UI is inactive for a long time
Fix chat sidebars toggle buttons disappearing (#7106). Thanks @philipp-classen
Fix autoscroll after initial fonts loading
Handle either missing <think> start or </think> end tags (#7102). Thanks @zombiegreedo
Fix custom stopping strings being reset when switching models
Fix navigation icons temporarily hiding when switching message versions (new in 3.7.1)
Revert "Keep the last chat message visible as the input area grows", as it was very glitchy (new in 3.7.1)

Backend updates

Bump llama.cpp to ggml-org/llama.cpp@6491d6e

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

philipp-classen, Cats1337, and 2 other contributors

Assets 12

07 Jul 21:14

oobabooga

v3.7

e1034fc

v3.7

Updated to v3.7.1!

https://github.com/oobabooga/text-generation-webui/releases/tag/v3.7.1

Assets 12

19 Jun 22:49

oobabooga

v3.6.1

17f9c18

v3.6.1

Changes

Merge the Default and Notebook tabs into a single Notebook tab (#7078), with an option in the Session tab to switch between one and two columns.
Autosave text in the Notebook tab (both generated and manually typed), and add "New" and "Rename" buttons for management.
- Saved prompts have been moved from user_data/prompts to user_data/logs/notebook, move any existing ones there.
Add a new Character tab for character settings.
Remember the last selected chat for each chat mode and character.
Truncate web search results to at most 8192 tokens to handle edge cases like pages with infinite scrolling.
Remove images and links from web search results to reduce noise and focus on the relevant text content.
Add an option to exclude attachments from previous messages in the chat prompt. It can be found in the Session tab.
Improve the wpp chat style.
Increase the size of the enlarged character profile picture that appears when clicking the profile picture.
Move 'Custom system message' to the Parameters > Generation tab.
Hide the navigation bar on Ctrl+S / Show controls click.
Always close/open the two sidebars at the same time when clicking their close buttons on desktop.
Only save active extensions and extensions settings on manual settings save.
More informative log message when the user input gets truncated.
Small style improvements to the chat tab.
Optimize scrolling in the chat tab.
Optimize syntax highlighting on long conversations.
Optimize the token count at the end of generation with llama.cpp.
Disable message action icons during streaming for better performance.
Expose real model list via /v1/models endpoint (#7088). Thanks @NoxWorld2660
Improved API examples in the documentation.
Show file sizes in the Model tab on "Get file list" (new in 3.6.1)
Force dark theme on the Gradio login page (new in 3.6.1)

Bug fixes

Ensure estimated vram is updated when switching between different models (#7071). Thanks @miriameng.
Fix an edge case where the gpu-layers slider maximum is incorrectly limited.
Add error handling for non-llama.cpp models in portable mode.
Fix the character profile picture sometimes not appearing when switching from instruct to chat modes.
Fix jittering while typing in the Chat tab on Firefox.
Fix the /v1/models output format (new in 3.6.1)
Bump numpy to 2.2 to fix loading certain EXL3 models on Windows (new in 3.6.1)
Fix obtaining the maximum number of GPU layers for DeepSeek-R1-0528-GGUF (new in 3.6.1)

Backend updates

Update llama.cpp to ggml-org/llama.cpp@e434e69
Update exllamav3 to 0.0.4

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Contributors

miriameng and NoxWorld2660

Assets 12

19 Jun 01:48

oobabooga

v3.6

92547be

v3.6

Updated to v3.6.1!

https://github.com/oobabooga/text-generation-webui/releases/tag/v3.6.1

Assets 12

Releases: oobabooga/text-generation-webui

v3.12

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.11

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.10 - Multimodal support!

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.9.1

Changes

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.9

Experimental GPT-OSS support!

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.8

Changes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Uh oh!

v3.7.1

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.7

Uh oh!

v3.6.1

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

Contributors

Uh oh!

v3.6

Uh oh!