main: server is listening on http://0.0.0.0:4000
main: starting the main loop...
srv update_slots: all slots are idle
srv params_from_: Chat format: Qwen3 Coder
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> ?min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 0 | task 0 | processing task, is_child = 0
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 221184, n_keep = 0, task.n_tokens = 14484
slot update_slots: id 0 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 1024, batch.n_tokens = 1024, progress = 0.070699
slot update_slots: id 0 | task 0 | n_tokens = 1024, memory_seq_rm [1024, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 1024, progress = 0.141397
slot update_slots: id 0 | task 0 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 3072, batch.n_tokens = 1024, progress = 0.212096
slot update_slots: id 0 | task 0 | n_tokens = 3072, memory_seq_rm [3072, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 1024, progress = 0.282795
slot update_slots: id 0 | task 0 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 5120, batch.n_tokens = 1024, progress = 0.353494
slot update_slots: id 0 | task 0 | n_tokens = 5120, memory_seq_rm [5120, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 1024, progress = 0.424192
slot update_slots: id 0 | task 0 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 7168, batch.n_tokens = 1024, progress = 0.494891
slot update_slots: id 0 | task 0 | n_tokens = 7168, memory_seq_rm [7168, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 1024, progress = 0.565590
slot update_slots: id 0 | task 0 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 9216, batch.n_tokens = 1024, progress = 0.636288
slot update_slots: id 0 | task 0 | n_tokens = 9216, memory_seq_rm [9216, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 1024, progress = 0.706987
slot update_slots: id 0 | task 0 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 11264, batch.n_tokens = 1024, progress = 0.777686
slot update_slots: id 0 | task 0 | n_tokens = 11264, memory_seq_rm [11264, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 1024, progress = 0.848384
slot update_slots: id 0 | task 0 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 13312, batch.n_tokens = 1024, progress = 0.919083
slot update_slots: id 0 | task 0 | n_tokens = 13312, memory_seq_rm [13312, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 1024, progress = 0.989782
slot update_slots: id 0 | task 0 | n_tokens = 14336, memory_seq_rm [14336, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 14420, batch.n_tokens = 84, progress = 0.995581
slot update_slots: id 0 | task 0 | n_tokens = 14420, memory_seq_rm [14420, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_tokens = 14484, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_tokens = 14484, batch.n_tokens = 64
slot init_sampler: id 0 | task 0 | init sampler, took 1.08 ms, tokens: text = 14484, total = 14484
slot update_slots: id 0 | task 0 | created context checkpoint 1 of 8 (pos_min = 14419, pos_max = 14419, size = 75.376 MiB)
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 libggml-base.0.9.5.dylib 0x00000001011753bc ggml_print_backtrace + 276
1 libggml-base.0.9.5.dylib 0x0000000101183a70 _ZL23ggml_uncaught_exceptionv + 12
2 libc++abi.dylib 0x000000019e642c2c _ZSt11__terminatePFvvE + 16
3 libc++abi.dylib 0x000000019e646394 __cxa_get_exception_ptr + 0
4 libc++abi.dylib 0x000000019e64633c _ZN10__cxxabiv1L12failed_throwEPNS_15__cxa_exceptionE + 0
5 libllama.0.0.7960.dylib 0x0000000101286030 _Z26llama_grammar_accept_tokenR13llama_grammariRKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE + 1016
6 libllama.0.0.7960.dylib 0x0000000101285af0 _Z25llama_grammar_accept_implR13llama_grammari + 464
7 llama-server 0x000000010088b5a8 _Z21common_sampler_acceptP14common_samplerib + 76
8 llama-server 0x000000010078c860 _ZN19server_context_impl12update_slotsEv + 10112
9 llama-server 0x000000010075d688 _ZN12server_queue10start_loopEx + 848
10 llama-server 0x00000001006fa4d0 main + 9568
11 dyld 0x000000019e2c5d54 start + 7184
libc++abi: terminating due to uncaught exception of type std::runtime_error: Unexpected empty grammar stack after accepting piece: =list (40972)
Name and Version
Operating systems
Mac
GGML backends
Metal
Hardware
M4 Mac Mini 64GB
Models
Qwen_Qwen3-Coder-Next-GGUF_Qwen3-Coder-Next-Q4_K_M_Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf
Problem description & steps to reproduce
Crashes
I get the following frequently and can be at the very beginning of a session or midway, so its not related to tokens consumed is my understaning.
The command line
First Bad Commit
No response
Relevant log output
Logs