Name and Version
version: 9037 (bbeb89d)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m "qwen3.5-0.8b.gguf"
Problem description & steps to reproduce
Configure in Web UI -> Settings -> Developer -> Custom JSON:
{"stream": false}
The setting is resulting in sending a correct client request (including { ... , "stream": false, ... }), causing the llama-server producing a correct, one-shot (non-streaming) reply, received by the Web UI (click on "Delete Message": "This will delete 2 messages including: 1 user message and 1 assistant response."). However, the Web UI is not displaying the reply and just keep stuck forever in "Processing...".
=== Correct Web UI client request ===
POST /v1/chat/completions HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:150.0) Gecko/20100101 Firefox/150.0
Accept: /
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8080/
Content-Type: application/json
Content-Length: 159
Origin: http://127.0.0.1:8080
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Cookie: sidebar:state=true
Priority: u=4
{"messages":[{"role":"user","content":"hi"}],"stream":false,"return_progress":true,"reasoning_format":"auto","backend_sampling":false,"timings_per_token":true}
=== Correct llama-server reply ===
HTTP/1.1 200 OK
Keep-Alive: timeout=5, max=100
Content-Type: application/json; charset=utf-8
Server: llama.cpp
Content-Length: 677
Access-Control-Allow-Origin: http://127.0.0.1:8080
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today? 🌟"}}],"created":1778068057,"model":"Qwen3.5-0.8B-IQ4_NL.gguf","system_fingerprint":"b9037-bbeb89d76","object":"chat.completion","usage":{"completion_tokens":13,"prompt_tokens":13,"total_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-ifSTP7DiRnSM7TYMQo1guC3flMRSd7Hq","timings":{"cache_n":0,"prompt_n":13,"prompt_ms":13.321,"prompt_per_token_ms":1.0246923076923076,"prompt_per_second":975.9027100067563,"predicted_n":13,"predicted_ms":73.908,"predicted_per_token_ms":5.6852307692307695,"predicted_per_second":175.89435514423337}}
First Bad Commit
No response
Relevant log output
Logs
Name and Version
version: 9037 (bbeb89d)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m "qwen3.5-0.8b.gguf"Problem description & steps to reproduce
Configure in Web UI -> Settings -> Developer -> Custom JSON:
{"stream": false}
The setting is resulting in sending a correct client request (including { ... , "stream": false, ... }), causing the llama-server producing a correct, one-shot (non-streaming) reply, received by the Web UI (click on "Delete Message": "This will delete 2 messages including: 1 user message and 1 assistant response."). However, the Web UI is not displaying the reply and just keep stuck forever in "Processing...".
=== Correct Web UI client request ===
POST /v1/chat/completions HTTP/1.1
Host: 127.0.0.1:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:150.0) Gecko/20100101 Firefox/150.0
Accept: /
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate
Referer: http://127.0.0.1:8080/
Content-Type: application/json
Content-Length: 159
Origin: http://127.0.0.1:8080
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Cookie: sidebar:state=true
Priority: u=4
{"messages":[{"role":"user","content":"hi"}],"stream":false,"return_progress":true,"reasoning_format":"auto","backend_sampling":false,"timings_per_token":true}
=== Correct llama-server reply ===
HTTP/1.1 200 OK
Keep-Alive: timeout=5, max=100
Content-Type: application/json; charset=utf-8
Server: llama.cpp
Content-Length: 677
Access-Control-Allow-Origin: http://127.0.0.1:8080
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today? 🌟"}}],"created":1778068057,"model":"Qwen3.5-0.8B-IQ4_NL.gguf","system_fingerprint":"b9037-bbeb89d76","object":"chat.completion","usage":{"completion_tokens":13,"prompt_tokens":13,"total_tokens":26,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-ifSTP7DiRnSM7TYMQo1guC3flMRSd7Hq","timings":{"cache_n":0,"prompt_n":13,"prompt_ms":13.321,"prompt_per_token_ms":1.0246923076923076,"prompt_per_second":975.9027100067563,"predicted_n":13,"predicted_ms":73.908,"predicted_per_token_ms":5.6852307692307695,"predicted_per_second":175.89435514423337}}
First Bad Commit
No response
Relevant log output
Logs