-
Notifications
You must be signed in to change notification settings - Fork 11.7k
llama : refactor kv cache guard #12695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After this commit, it seems like RNN based models like RWKV don't work anymore, and asset at llama.cpp/src/llama-kv-cache.cpp Line 208 in 626f822
cc: @MollySophia |
@LostRuins In case you can check if the issue is resolved with the upcoming #12799 would appreciate feedback. Thanks. |
Hi @ggerganov, unfortunately #12799 does not seem to solve the issue. Trying on RWKV7-Goose-World3-2.9B-HF-q3_k_s, I still get this assert:
|
Do you have a repro with some of the tools in ./bin/llama-cli -hf Mungert/RWKV7-Goose-World3-2.9B-HF-GGUF:Q3_K_S -p "I believe the meaning of life is" -no-cnv -n 32
I believe the meaning of life is that we are here to be here.
[end of text] And it works. But this also works on |
Nvm, I reproduced with |
Should be fixed in the latest commit in #12799 |
No more asserts, can confirm it seems to work and generate fine now. |
Simplify the KV cache guard mechanism. Prepare for separate recurrent cache implementation.
Also,
llama_decode
now correctly returns1
when the batch cannot fit in the KV cache and the KV cache state is correctly restored upon failure to process the batch.