Misc. bug: SM tensor has a memory leak

### Name and Version

$ ./llama-cli --version
version: 9279 (52be242ad)
built with GNU 13.3.0 for Linux x86_64

build in docker with: 
    -DGGML_CUDA=ON \
    -DGGML_CUDA_FA_ALL_QUANTS=ON \
    -DGGML_CUDA_NCCL=ON 

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

libllama (core library)

### Command line

```shell
./llama-server
--jinja
-t 16
-fa on
--no-mmap
-dio
--slot-save-path /slots
--metrics
--log-prefix
--log-timestamps
--cache-ram 6000
-m /models/final/qwen36-27b/mtp/Qwen3.6-27B-Q8_0.gguf
--min-p 0.0
--top-k 20
--top-p 0.95
--temp 0.6
--image-min-tokens 1024
--mmproj /models/final/qwen36-27b/mmproj-BF16.gguf
-c 1040000
-np 4
-ngl 999
-ctk f16
-ctv f16
-sm tensor
```

### Problem description & steps to reproduce

Running with 10 nvidia rtx 5016 16GB GPU's on tensor split mode with 4 parallel slots for some time results in llama.cpp crashing, see stacktrace below.
The host memory fills up, i can postpone this by increasing the reserved 1GB to 8GB in [ggml-backend-meta.cpp](https://github.com/ggml-org/llama.cpp/blob/5306f4b3b54a0e261e83b1d2961a97685e898871/ggml/src/ggml-backend-meta.cpp#L1459) and  [here](https://github.com/ggml-org/llama.cpp/blob/5306f4b3b54a0e261e83b1d2961a97685e898871/ggml/src/ggml-backend-meta.cpp#L1438) but it still crashes at the 80gb mark (10gpus x 8gb)

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
20.21.856.308 I slot print_timing: id  0 | task 30056 | prompt processing, n_tokens =  22495, progress = 0.84, t =  26.75 s / 840.87 tokens per second
/tmp/llama.cpp/ggml/src/ggml.c:1766: GGML_ASSERT(obj_new) failed
20.22.614.845 W ggml_new_object: not enough space in the context's memory pool (needed 1073741936, available 1073741824)
./llama-server(+0x130e35b)[0x5c008490735b]
./llama-server(+0x130e8ac)[0x5c00849078ac]
./llama-server(+0x130ea8b)[0x5c0084907a8b]
./llama-server(+0x130f6e1)[0x5c00849086e1]
./llama-server(+0x132c02c)[0x5c008492502c]
./llama-server(+0x131fd16)[0x5c0084918d16]
./llama-server(+0x13250ff)[0x5c008491e0ff]
./llama-server(+0x523f17)[0x5c0083b1cf17]
./llama-server(+0x52a567)[0x5c0083b23567]
./llama-server(+0x52bd1f)[0x5c0083b24d1f]
./llama-server(+0x24c990)[0x5c0083845990]
./llama-server(+0x2dfb21)[0x5c00838d8b21]
./llama-server(+0x1bc845)[0x5c00837b5845]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7461755701ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x74617557028b]
./llama-server(+0x1b7585)[0x5c00837b0585]

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: SM tensor has a memory leak #23486

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Misc. bug: SM tensor has a memory leak #23486

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions