You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is intended behaviour, but when hitting the "Stop generating" button, while the output being displayed does stop streaming, the llama.cpp server backend keeps generating until the stop token or the token limit.
Running a standard llama.cpp server, other frontends using the same server will stop generating as expected: ./server -m ~/models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 32768 -ngl 128 -ts 39,61,0 -sm row --host 0.0.0.0
Not sure if this is intended behaviour, but when hitting the "Stop generating" button, while the output being displayed does stop streaming, the llama.cpp server backend keeps generating until the stop token or the token limit.
Running a standard llama.cpp server, other frontends using the same server will stop generating as expected:
./server -m ~/models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 32768 -ngl 128 -ts 39,61,0 -sm row --host 0.0.0.0
.env.local
The text was updated successfully, but these errors were encountered: