Skip to content

v3.5

Latest
Compare
Choose a tag to compare
@oobabooga oobabooga released this 11 Jun 02:15
· 19 commits to main since this release
1e96dcf

Changes

  • Optimize chat streaming by only updating the last message during streaming and adding back the dynamic UI update speed. These changes make streaming smooth even at 100k tokens context.
  • Add a CUDA 12.8 installation option for RTX 50XX NVIDIA Blackwell support (ExLlamaV2/V3 and Transformers) (#7011). Thanks @okazaki10
  • Make UI settings persistent. Any value you change, including sliders in the Parameters tab, chat mode, character, character description fields, etc, now gets automatically saved to user_data/settings.yaml. If you close the UI and launch it again, the values will be where you left them. The Model tab is left as an exception since it's managed by command-line flags and its own "Save settings" menu.
  • Make the dark theme darker and more aesthetic.
  • Add support for .docx attachments.
  • Add 🗑️ buttons for easily deleting individual past chats.
  • Add new buttons: "Restore preset", "Neutralize samplers", "Restore character".
  • Reorganize the Parameters tab with parameters that get saved to presets on the left and everything else on the right.
  • Add Qwen3 presets (Thinking and No Thinking), and make Qwen3 - Thinking the new default preset. If you update a portable install manually by moving user_data, you will not have these files; download them from here if you are interested.
  • Add the model name to each message's metadata, and show it in the UI when hovering the date/time for a message.
  • Scroll up automatically to show the whole editing area when editing a message.
  • Add an option to turn long pasted text into an attachment automatically. This is disabled by default and can be enabled in the Session tab.
  • Extract the text of web searches with formatting instead of putting all text on a single line.
  • Show llama.cpp prompt processing progress on a single line.
  • Add informative tooltips when hovering the file upload icon and the web search checkbox.
  • Several small UI optimizations.
  • Several small UI style improvements.
  • Use user_data/cache/gradio for Gradio temorary files instead of the system's temporary folder.

Bug fixes

  • Filter out failed web search downloads from attachments.
  • Remove quotes from LLM-generated web search queries.
  • Fix the progress bar for downloading a model not appearing in the UI.
  • Fix the text for a sent message reappearing in the input area when the page is reloaded.
  • Fix selecting the next chat on the list when deleting a chat with an active search.
  • Fix light/dark theme persistence across page reloads.
  • Re-highlight code blocks when switching light/dark themes to fix styling issues.
  • Stop llama.cpp model during graceful shutdown to avoid an error message (#7042). Thanks @leszekhanusz
  • Check .attention.head_count if .attention.head_count_kv doesn't exist for VRAM calculation (#7048). Thanks @miriameng
  • Fix failure when --nowebui is called without --api (#7055). Thanks @miriameng
  • Fix continue/start reply with when using translation extensions (#6944). Thanks @mykeehu
  • Load js and css sources in UTF-8 (#7059). Thanks @LawnMauer

Backend updates


Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

  • Windows/Linux:

    • NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
    • AMD/Intel GPU: Use vulkan builds.
    • CPU only: Use cpu builds.
  • Mac:

    • Apple Silicon: Use macos-arm64.
    • Intel CPU: Use macos-x86_64.

Updating a portable install:

  1. Download and unzip the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.