Skip to content

[Bug]: Tools for quering vector DB not called when running gpt-oss:20b on llama.cpp server #2398

@ndrewpj

Description

@ndrewpj

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

I have used Ollama with gpt-oss:20b for a while but have to change Ollama to llama.cpp server.

When using Ollama the tool calls obviously work - I got the needed response in LightRAG chat. When I use the llama-server - it seems the tool calls are not working.

Now, I am in doubt where the issues is since in llama.cpp there is a guide ggml-org/llama.cpp#15396 how to propoerly run gpt-oss models in llama.cpp for the tools to work. Some say - partially the issue is in the jinja/chat templates format - so this is not in LightRAG scope. But some say the clients that invoke commands to LLM also should follow some specific formats.
ggml-org/llama.cpp#15341

In the end, I have to use LightRAG with this model AND llama.cpp. Ollama is not an option anymore. Please help, what else can I try?

Also my bug in llama.cpp: ggml-org/llama.cpp#17410

Steps to reproduce

  1. run llama-server with gpt-oss:20b model
  2. configure LightRAG .env file to use llama-server:
    LLM type = openai
    baseurl: IP:port/v1 to the llama-server

Expected Behavior

Chat response with data from the RAG

LightRAG Config Used

Paste your config here

Logs and screenshots

No response

Additional Information

  • LightRAG Version: v1.4.9.8/0251
  • Operating System: Ubuntu 24.04.3
  • Python Version: 3.21
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions