llama : refactor kv cache guard by ggerganov · Pull Request #12695 · ggml-org/llama.cpp

ggerganov · 2025-04-01T17:10:47Z

Simplify the KV cache guard mechanism. Prepare for separate recurrent cache implementation.

Also, llama_decode now correctly returns 1 when the batch cannot fit in the KV cache and the KV cache state is correctly restored upon failure to process the batch.

ggml-ci

LostRuins · 2025-04-14T06:52:28Z

After this commit, it seems like RNN based models like RWKV don't work anymore, and asset at llama-kv-cache.cpp:594: GGML_ASSERT(empty_cell.is_empty()) failed. Reverting the early return line at

llama.cpp/src/llama-kv-cache.cpp

Line 208 in 626f822

return true;

seems to allow RWKV to work again.

cc: @MollySophia

ggerganov · 2025-04-24T19:38:11Z

@LostRuins In case you can check if the issue is resolved with the upcoming #12799 would appreciate feedback. Thanks.

LostRuins · 2025-04-25T09:08:40Z

Hi @ggerganov, unfortunately #12799 does not seem to solve the issue. Trying on RWKV7-Goose-World3-2.9B-HF-q3_k_s, I still get this assert:

src/llama-kv-cache.cpp:1803: GGML_ASSERT(empty_cell.is_empty()) failed

ggerganov · 2025-04-25T10:14:05Z

Do you have a repro with some of the tools in llama.cpp? I tried:

./bin/llama-cli -hf Mungert/RWKV7-Goose-World3-2.9B-HF-GGUF:Q3_K_S -p "I believe the meaning of life is" -no-cnv -n 32

I believe the meaning of life is that we are here to be here.

 [end of text]

And it works. But this also works on master.

ggerganov · 2025-04-25T10:23:34Z

Nvm, I reproduced with llama-server (didn't notice this is inside seq_rm()).

ggerganov · 2025-04-25T10:29:22Z

Should be fixed in the latest commit in #12799

LostRuins · 2025-04-25T15:40:35Z

No more asserts, can confirm it seems to work and generate fine now.

* llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning

ggerganov added 6 commits April 1, 2025 20:09

llama : refactor kv cache guard

f1d179e

ggml-ci

cont : fix comment [no ci]

4fdd6e5

llama : fix kv_cache restore logic

623954b

ggml-ci

context : simplify kv cache updates

5c84488

ggml-ci

cont : better name [no ci]

eb5518f

llama : fix llama_decode return code when could not find KV slot

2c41dff

ggml-ci

github-actions Bot added the examples label Apr 2, 2025

ggerganov added 2 commits April 2, 2025 14:10

context : change log err -> warn [no ci]

8ab37b1

kv-cache : add comment + warning [no ci]

626f822

ggerganov merged commit a10b36c into master Apr 2, 2025
1 check passed

ggerganov deleted the gg/llama-kv-cache-v4 branch April 2, 2025 11:33

hnfong mentioned this pull request Apr 3, 2025

Eval bug: commit: no pending KV cache updates to commit - might indicate a bug #12730

Closed

This was referenced Apr 4, 2025

llama : add llama_batch_ext #11875

Open

kv-cache : simplify + fix warning for recurrent models #12756

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : refactor kv cache guard#12695

llama : refactor kv cache guard#12695
ggerganov merged 8 commits intomasterfrom
gg/llama-kv-cache-v4

ggerganov commented Apr 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

LostRuins commented Apr 14, 2025

Uh oh!

ggerganov commented Apr 24, 2025

Uh oh!

LostRuins commented Apr 25, 2025 •

edited

Loading

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

LostRuins commented Apr 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

LostRuins commented Apr 14, 2025

Uh oh!

ggerganov commented Apr 24, 2025

Uh oh!

LostRuins commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

ggerganov commented Apr 25, 2025

Uh oh!

LostRuins commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Apr 1, 2025 •

edited

Loading

LostRuins commented Apr 25, 2025 •

edited

Loading

LostRuins commented Apr 25, 2025 •

edited

Loading