"Stop generating" button only stops the printing, llama.cpp continues #890

reversebias · 2024-03-01T02:09:23Z

Not sure if this is intended behaviour, but when hitting the "Stop generating" button, while the output being displayed does stop streaming, the llama.cpp server backend keeps generating until the stop token or the token limit.

Running a standard llama.cpp server, other frontends using the same server will stop generating as expected:
./server -m ~/models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 32768 -ngl 128 -ts 39,61,0 -sm row --host 0.0.0.0

.env.local

HF_TOKEN=None
USE_LOCAL_WEBSEARCH=true
MODELS=`[
  {
      "name": "mixtral-instrtuct",
      "preprompt" : "",
      "chatPromptTemplate": "<s> {{#each messages}}{{#ifUser}}[INST]{{#if @first}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}} {{content}} [/INST]{{/ifUser}}{{#ifAssistant}} {{content}}</s> {{/ifAssistant}}{{/each}}",
      "parameters": {
        "temperature" : 0.2,
        "top_p" : 0.95,
        "repetition_penalty" : 1.2,
        "top_k" : 50,
        "truncate" : 24576,
        "max_new_tokens" : 512,
        "stop" : ["</s>"]
      },
      "endpoints": [
        {
         "url": "http://127.0.0.1:8080",
         "type": "llamacpp"
        }
      ]
  }
]`

The text was updated successfully, but these errors were encountered:

mlim15 mentioned this issue Apr 20, 2024

Prompt Format Updates for LLama3 #1035

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Stop generating" button only stops the printing, llama.cpp continues #890

"Stop generating" button only stops the printing, llama.cpp continues #890

reversebias commented Mar 1, 2024

"Stop generating" button only stops the printing, llama.cpp continues #890

"Stop generating" button only stops the printing, llama.cpp continues #890

Comments

reversebias commented Mar 1, 2024