Conversation
|
After this commit, it seems like RNN based models like RWKV don't work anymore, and asset at llama.cpp/src/llama-kv-cache.cpp Line 208 in 626f822 cc: @MollySophia |
|
@LostRuins In case you can check if the issue is resolved with the upcoming #12799 would appreciate feedback. Thanks. |
|
Hi @ggerganov, unfortunately #12799 does not seem to solve the issue. Trying on RWKV7-Goose-World3-2.9B-HF-q3_k_s, I still get this assert:
|
|
Do you have a repro with some of the tools in ./bin/llama-cli -hf Mungert/RWKV7-Goose-World3-2.9B-HF-GGUF:Q3_K_S -p "I believe the meaning of life is" -no-cnv -n 32
I believe the meaning of life is that we are here to be here.
[end of text]And it works. But this also works on |
|
Nvm, I reproduced with |
|
Should be fixed in the latest commit in #12799 |
|
No more asserts, can confirm it seems to work and generate fine now. |
* llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning
* llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning
Simplify the KV cache guard mechanism. Prepare for separate recurrent cache implementation.
Also,
llama_decodenow correctly returns1when the batch cannot fit in the KV cache and the KV cache state is correctly restored upon failure to process the batch.