llama : refactor kv cache guard #12695

ggerganov · 2025-04-01T17:10:47Z

Simplify the KV cache guard mechanism. Prepare for separate recurrent cache implementation.

Also, llama_decode now correctly returns 1 when the batch cannot fit in the KV cache and the KV cache state is correctly restored upon failure to process the batch.

ggml-ci

LostRuins · 2025-04-14T06:52:28Z

After this commit, it seems like RNN based models like RWKV don't work anymore, and asset at llama-kv-cache.cpp:594: GGML_ASSERT(empty_cell.is_empty()) failed. Reverting the early return line at

llama.cpp/src/llama-kv-cache.cpp

Line 208 in 626f822

return true;

seems to allow RWKV to work again.

cc: @MollySophia

ggerganov · 2025-04-24T19:38:11Z

@LostRuins In case you can check if the issue is resolved with the upcoming #12799 would appreciate feedback. Thanks.

LostRuins · 2025-04-25T09:08:40Z

Hi @ggerganov, unfortunately #12799 does not seem to solve the issue. Trying on RWKV7-Goose-World3-2.9B-HF-q3_k_s, I still get this assert:

src/llama-kv-cache.cpp:1803: GGML_ASSERT(empty_cell.is_empty()) failed

ggerganov · 2025-04-25T10:14:05Z

Do you have a repro with some of the tools in llama.cpp? I tried:

./bin/llama-cli -hf Mungert/RWKV7-Goose-World3-2.9B-HF-GGUF:Q3_K_S -p "I believe the meaning of life is" -no-cnv -n 32

I believe the meaning of life is that we are here to be here.

 [end of text]

And it works. But this also works on master.

ggerganov · 2025-04-25T10:23:34Z

Nvm, I reproduced with llama-server (didn't notice this is inside seq_rm()).

ggerganov · 2025-04-25T10:29:22Z

Should be fixed in the latest commit in #12799

LostRuins · 2025-04-25T15:40:35Z

No more asserts, can confirm it seems to work and generate fine now.

ggerganov added 6 commits April 1, 2025 20:09

llama : refactor kv cache guard

f1d179e

ggml-ci

cont : fix comment [no ci]

4fdd6e5

llama : fix kv_cache restore logic

623954b

ggml-ci

context : simplify kv cache updates

5c84488

ggml-ci

cont : better name [no ci]

eb5518f

llama : fix llama_decode return code when could not find KV slot

2c41dff

ggml-ci

github-actions bot added the examples label Apr 2, 2025

ggerganov added 2 commits April 2, 2025 14:10

context : change log err -> warn [no ci]

8ab37b1

kv-cache : add comment + warning [no ci]

626f822

ggerganov merged commit a10b36c into master Apr 2, 2025
1 check passed

ggerganov deleted the gg/llama-kv-cache-v4 branch April 2, 2025 11:33

hnfong mentioned this pull request Apr 3, 2025

Eval bug: commit: no pending KV cache updates to commit - might indicate a bug #12730

Closed

This was referenced Apr 4, 2025

llama : add llama_batch_ext #11875

Open

kv-cache : simplify + fix warning for recurrent models #12756

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : refactor kv cache guard #12695

llama : refactor kv cache guard #12695

ggerganov commented Apr 1, 2025 •

edited

Loading

LostRuins commented Apr 14, 2025

ggerganov commented Apr 24, 2025

LostRuins commented Apr 25, 2025 •

edited

Loading

ggerganov commented Apr 25, 2025

ggerganov commented Apr 25, 2025

ggerganov commented Apr 25, 2025

LostRuins commented Apr 25, 2025 •

edited

Loading

llama : refactor kv cache guard #12695

llama : refactor kv cache guard #12695

Conversation

ggerganov commented Apr 1, 2025 • edited Loading

LostRuins commented Apr 14, 2025

ggerganov commented Apr 24, 2025

LostRuins commented Apr 25, 2025 • edited Loading

ggerganov commented Apr 25, 2025

ggerganov commented Apr 25, 2025

ggerganov commented Apr 25, 2025

LostRuins commented Apr 25, 2025 • edited Loading

ggerganov commented Apr 1, 2025 •

edited

Loading

LostRuins commented Apr 25, 2025 •

edited

Loading

LostRuins commented Apr 25, 2025 •

edited

Loading