Skip to content

Misc. bug: SM tensor has a memory leak #23486

@krampenschiesser

Description

@krampenschiesser

Name and Version

$ ./llama-cli --version
version: 9279 (52be242ad)
built with GNU 13.3.0 for Linux x86_64

build in docker with:
-DGGML_CUDA=ON
-DGGML_CUDA_FA_ALL_QUANTS=ON
-DGGML_CUDA_NCCL=ON

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

libllama (core library)

Command line

./llama-server
--jinja
-t 16
-fa on
--no-mmap
-dio
--slot-save-path /slots
--metrics
--log-prefix
--log-timestamps
--cache-ram 6000
-m /models/final/qwen36-27b/mtp/Qwen3.6-27B-Q8_0.gguf
--min-p 0.0
--top-k 20
--top-p 0.95
--temp 0.6
--image-min-tokens 1024
--mmproj /models/final/qwen36-27b/mmproj-BF16.gguf
-c 1040000
-np 4
-ngl 999
-ctk f16
-ctv f16
-sm tensor

Problem description & steps to reproduce

Running with 10 nvidia rtx 5016 16GB GPU's on tensor split mode with 4 parallel slots for some time results in llama.cpp crashing, see stacktrace below.
The host memory fills up, i can postpone this by increasing the reserved 1GB to 8GB in ggml-backend-meta.cpp and here but it still crashes at the 80gb mark (10gpus x 8gb)

First Bad Commit

No response

Relevant log output

Logs
20.21.856.308 I slot print_timing: id  0 | task 30056 | prompt processing, n_tokens =  22495, progress = 0.84, t =  26.75 s / 840.87 tokens per second
/tmp/llama.cpp/ggml/src/ggml.c:1766: GGML_ASSERT(obj_new) failed
20.22.614.845 W ggml_new_object: not enough space in the context's memory pool (needed 1073741936, available 1073741824)
./llama-server(+0x130e35b)[0x5c008490735b]
./llama-server(+0x130e8ac)[0x5c00849078ac]
./llama-server(+0x130ea8b)[0x5c0084907a8b]
./llama-server(+0x130f6e1)[0x5c00849086e1]
./llama-server(+0x132c02c)[0x5c008492502c]
./llama-server(+0x131fd16)[0x5c0084918d16]
./llama-server(+0x13250ff)[0x5c008491e0ff]
./llama-server(+0x523f17)[0x5c0083b1cf17]
./llama-server(+0x52a567)[0x5c0083b23567]
./llama-server(+0x52bd1f)[0x5c0083b24d1f]
./llama-server(+0x24c990)[0x5c0083845990]
./llama-server(+0x2dfb21)[0x5c00838d8b21]
./llama-server(+0x1bc845)[0x5c00837b5845]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7461755701ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x74617557028b]
./llama-server(+0x1b7585)[0x5c00837b0585]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions