Misc. bug: Cuda Error at cleanup in ggml_backend_cuda_context

I am building a python module that calls into C++ binding for llama.cpp so it is likely me doing something wrong and not a bug in llama.cpp itself. Any help figuring this out would be welcomed!

I am using a recent version of llama.cpp (08/01/2025).

The stack trace is provided bellow. Relevant extracts from my code:
```
// Destructor for the module manager
Manager::~Manager() {
  models.clear();
  llama_backend_free();
}

// Destructor for the model wrapper
Model::~Model() {
  for (auto* ctx : this->contexts) if (ctx) llama_free(ctx);
  contexts.clear();

  if (this->model) llama_model_free(model);
}
```
Basically the resulting sequence should be to call:
1. `llama_free(ctx)`
2. `llama_model_free(model)`
3. `llama_backend_free()`

It looks to me that the failure occurs in the first step: `llama_free(ctx)`

Currently, it is a single model with a single context.

### Operating systems

_No response_

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell

```

### Problem description & steps to reproduce

On a system where a recent LLama.cpp is installed (with CUDA enabled). It should be possible to run (you might want a venv and to setup LD_LIBRARY_PATH):
```
git clone https://github.com/tristanvdb/AutoCog AutoCog
cd AutoCog
pip install .
tests/autocog/llama/execute_sta_with_llama_cpp.py tests/samples/micro.sta '{}' /path/to/any/model.gguf
```

### First Bad Commit

_No response_

### Relevant log output

```shell
CUDA error: driver shutting down
  current device: -1, in function ~ggml_backend_cuda_context at /tmp/llama_cpp/ggml/src/ggml-cuda/ggml-cuda.cu:538
  cudaStreamDestroy(streams[i][j])
/tmp/llama_cpp/ggml/src/ggml-cuda/ggml-cuda.cu:82: CUDA error
/usr/local/lib64/libggml-base.so(+0x150c8)[0x7fe18abf40c8]
/usr/local/lib64/libggml-base.so(ggml_print_backtrace+0x1e6)[0x7fe18abf4496]
/usr/local/lib64/libggml-base.so(ggml_abort+0x11d)[0x7fe18abf461d]
/usr/local/lib64/libggml-cuda.so(+0xcbd02)[0x7fe18ad40d02]
/usr/local/lib64/libggml-cuda.so(_ZN25ggml_backend_cuda_contextD1Ev+0x22f)[0x7fe18ad438af]
/usr/local/lib64/libggml-cuda.so(+0xce97b)[0x7fe18ad4397b]
/usr/local/lib64/libllama.so(_ZN13llama_contextD1Ev+0x292)[0x7fe18c3baa42]
/usr/local/lib64/libllama.so(llama_free+0xe)[0x7fe18c3bcf1e]
/opt/lib64/python3.9/site-packages/autocog/llama.cpython-39-x86_64-linux-gnu.so(+0x19226)[0x7fe18c543226]
/opt/lib64/python3.9/site-packages/autocog/llama.cpython-39-x86_64-linux-gnu.so(+0x1a73c)[0x7fe18c54473c]
/usr/lib64/libc.so.6(+0x412dd)[0x7fe1cafbc2dd]
/usr/lib64/libc.so.6(on_exit+0x0)[0x7fe1cafbc430]
/usr/lib64/libc.so.6(+0x295d7)[0x7fe1cafa45d7]
/usr/lib64/libc.so.6(__libc_start_main+0x80)[0x7fe1cafa4680]
python3(_start+0x25)[0x561edf340095]
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Cuda Error at cleanup in ggml_backend_cuda_context #15316

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Cuda Error at cleanup in ggml_backend_cuda_context #15316

Description

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions