Flaky server responses with llama 3 #6785

kurnevsky · 2024-04-20T14:15:11Z

I noticed that some of the responses I got from llama-cpp server (latest master) are unnaturally fast for 70b model, and it happens randomly. And when this happens the response has worse quality. The model I'm using is https://huggingface.co/NousResearch/Meta-Llama-3-70B-Instruct-GGUF/blob/main/Meta-Llama-3-70B-Instruct-Q5_K_M.gguf with the command line llama-server -m Meta-Llama-3-70B-Instruct-Q5_K_M.gguf -c 0 -t 24 -ngl 24. It's only partially offloaded to gpu (with rocm on linux) so maybe somehow llama-cpp doesn't use all layers when it responds quickly.

The text was updated successfully, but these errors were encountered:

phymbert · 2024-04-20T14:18:06Z

Llama 3 is not yet supported, please wait for:

kurnevsky · 2024-04-20T14:31:53Z

It works perfectly fine when it's responding slowly. I do not use chat template, I use my own client that calls /completion endpoint.

phymbert · 2024-04-20T14:41:16Z

Feel free to reopen once those 2 PR are merged.

kurnevsky · 2024-04-20T14:43:39Z

I have no rights to reopen issues in this repo. Also I looked through those MRs - they have nothing to do with this problem.

phymbert · 2024-04-20T14:49:37Z

llama3 is not supported, could you understand the GGUF you are using is probably just wrong ?

phymbert · 2024-04-20T15:25:57Z

If your issue persists once you have converted again the HF model and run the latest server code with those PRs merged. Please ping. I will reopen

kurnevsky added the bug-unconfirmed label Apr 20, 2024

phymbert closed this as completed Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky server responses with llama 3 #6785

Flaky server responses with llama 3 #6785

kurnevsky commented Apr 20, 2024

phymbert commented Apr 20, 2024

kurnevsky commented Apr 20, 2024 •

edited

Loading

phymbert commented Apr 20, 2024

kurnevsky commented Apr 20, 2024

phymbert commented Apr 20, 2024

phymbert commented Apr 20, 2024 •

edited

Loading

Flaky server responses with llama 3 #6785

Flaky server responses with llama 3 #6785

Comments

kurnevsky commented Apr 20, 2024

phymbert commented Apr 20, 2024

kurnevsky commented Apr 20, 2024 • edited Loading

phymbert commented Apr 20, 2024

kurnevsky commented Apr 20, 2024

phymbert commented Apr 20, 2024

phymbert commented Apr 20, 2024 • edited Loading

kurnevsky commented Apr 20, 2024 •

edited

Loading

phymbert commented Apr 20, 2024 •

edited

Loading