-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Open
Labels
Description
I am building a python module that calls into C++ binding for llama.cpp so it is likely me doing something wrong and not a bug in llama.cpp itself. Any help figuring this out would be welcomed!
I am using a recent version of llama.cpp (08/01/2025).
The stack trace is provided bellow. Relevant extracts from my code:
// Destructor for the module manager
Manager::~Manager() {
models.clear();
llama_backend_free();
}
// Destructor for the model wrapper
Model::~Model() {
for (auto* ctx : this->contexts) if (ctx) llama_free(ctx);
contexts.clear();
if (this->model) llama_model_free(model);
}
Basically the resulting sequence should be to call:
llama_free(ctx)
llama_model_free(model)
llama_backend_free()
It looks to me that the failure occurs in the first step: llama_free(ctx)
Currently, it is a single model with a single context.
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
On a system where a recent LLama.cpp is installed (with CUDA enabled). It should be possible to run (you might want a venv and to setup LD_LIBRARY_PATH):
git clone https://github.com/tristanvdb/AutoCog AutoCog
cd AutoCog
pip install .
tests/autocog/llama/execute_sta_with_llama_cpp.py tests/samples/micro.sta '{}' /path/to/any/model.gguf
First Bad Commit
No response
Relevant log output
CUDA error: driver shutting down
current device: -1, in function ~ggml_backend_cuda_context at /tmp/llama_cpp/ggml/src/ggml-cuda/ggml-cuda.cu:538
cudaStreamDestroy(streams[i][j])
/tmp/llama_cpp/ggml/src/ggml-cuda/ggml-cuda.cu:82: CUDA error
/usr/local/lib64/libggml-base.so(+0x150c8)[0x7fe18abf40c8]
/usr/local/lib64/libggml-base.so(ggml_print_backtrace+0x1e6)[0x7fe18abf4496]
/usr/local/lib64/libggml-base.so(ggml_abort+0x11d)[0x7fe18abf461d]
/usr/local/lib64/libggml-cuda.so(+0xcbd02)[0x7fe18ad40d02]
/usr/local/lib64/libggml-cuda.so(_ZN25ggml_backend_cuda_contextD1Ev+0x22f)[0x7fe18ad438af]
/usr/local/lib64/libggml-cuda.so(+0xce97b)[0x7fe18ad4397b]
/usr/local/lib64/libllama.so(_ZN13llama_contextD1Ev+0x292)[0x7fe18c3baa42]
/usr/local/lib64/libllama.so(llama_free+0xe)[0x7fe18c3bcf1e]
/opt/lib64/python3.9/site-packages/autocog/llama.cpython-39-x86_64-linux-gnu.so(+0x19226)[0x7fe18c543226]
/opt/lib64/python3.9/site-packages/autocog/llama.cpython-39-x86_64-linux-gnu.so(+0x1a73c)[0x7fe18c54473c]
/usr/lib64/libc.so.6(+0x412dd)[0x7fe1cafbc2dd]
/usr/lib64/libc.so.6(on_exit+0x0)[0x7fe1cafbc430]
/usr/lib64/libc.so.6(+0x295d7)[0x7fe1cafa45d7]
/usr/lib64/libc.so.6(__libc_start_main+0x80)[0x7fe1cafa4680]
python3(_start+0x25)[0x561edf340095]
Aborted (core dumped)