Skip to content

Llama.cpp server destroys <|eot_id|> token even midway through prompt! #6793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
araleza opened this issue Apr 20, 2024 · 1 comment
Closed

Comments

@araleza
Copy link

araleza commented Apr 20, 2024

In ./server, trying to correctly use Continuation mode with Llama 3 70B is not possible, as the correct prompt template cannot be entered. This is because the token <|eot_id|> becomes zero tokens, even when it occurs midway through the prompt:

image

(In the above image, I hit start and looked at the number of tokens cached minus the number of tokens predicted: 402 - 400 = 2. This value is the number of tokens I typed as my prompt. Llama's result shown is 2 where it should be 3. I deleted the generated tokens before taking this screenshot, to show what I originally typed)

This token is required multiple times by the prompt template, which looks like this:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

[system prompt goes here]<|eot_id|><|start_header_id|>user<|end_header_id|>

[user prompt goes here]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

[ai response will go here]

Not adhering to the prompt usually decreases the ability of the LLM.

@phymbert
Copy link
Collaborator

Please wait for:

If your issue persists once you have converted again the HF model and run the latest server code with those PRs merged. Please ping. I will reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants