Skip to content

Flaky server responses with llama 3 #6785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kurnevsky opened this issue Apr 20, 2024 · 6 comments
Closed

Flaky server responses with llama 3 #6785

kurnevsky opened this issue Apr 20, 2024 · 6 comments

Comments

@kurnevsky
Copy link
Contributor

I noticed that some of the responses I got from llama-cpp server (latest master) are unnaturally fast for 70b model, and it happens randomly. And when this happens the response has worse quality. The model I'm using is https://huggingface.co/NousResearch/Meta-Llama-3-70B-Instruct-GGUF/blob/main/Meta-Llama-3-70B-Instruct-Q5_K_M.gguf with the command line llama-server -m Meta-Llama-3-70B-Instruct-Q5_K_M.gguf -c 0 -t 24 -ngl 24. It's only partially offloaded to gpu (with rocm on linux) so maybe somehow llama-cpp doesn't use all layers when it responds quickly.

@phymbert
Copy link
Collaborator

Llama 3 is not yet supported, please wait for:

@kurnevsky
Copy link
Contributor Author

kurnevsky commented Apr 20, 2024

It works perfectly fine when it's responding slowly. I do not use chat template, I use my own client that calls /completion endpoint.

@phymbert
Copy link
Collaborator

Feel free to reopen once those 2 PR are merged.

@kurnevsky
Copy link
Contributor Author

I have no rights to reopen issues in this repo. Also I looked through those MRs - they have nothing to do with this problem.

@phymbert
Copy link
Collaborator

llama3 is not supported, could you understand the GGUF you are using is probably just wrong ?

@phymbert
Copy link
Collaborator

phymbert commented Apr 20, 2024

If your issue persists once you have converted again the HF model and run the latest server code with those PRs merged. Please ping. I will reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants