Skip to content

Commit 5555c0c

Browse files
authored
docs: update server streaming mode documentation (#9519)
Provide more documentation for streaming mode.
1 parent 973f328 commit 5555c0c

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

examples/server/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -303,23 +303,23 @@ mkdir llama-client
303303
cd llama-client
304304
```
305305

306-
Create a index.js file and put this inside:
306+
Create an index.js file and put this inside:
307307

308308
```javascript
309-
const prompt = `Building a website can be done in 10 simple steps:`;
309+
const prompt = "Building a website can be done in 10 simple steps:"
310310
311-
async function Test() {
311+
async function test() {
312312
let response = await fetch("http://127.0.0.1:8080/completion", {
313-
method: 'POST',
313+
method: "POST",
314314
body: JSON.stringify({
315315
prompt,
316-
n_predict: 512,
316+
n_predict: 64,
317317
})
318318
})
319319
console.log((await response.json()).content)
320320
}
321321
322-
Test()
322+
test()
323323
```
324324

325325
And run it:
@@ -381,7 +381,7 @@ Multiple prompts are also supported. In this case, the completion result will be
381381
`n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
382382
By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.
383383

384-
`stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.
384+
`stream`: Allows receiving each predicted token in real-time instead of waiting for the completion to finish (uses a different response format). To enable this, set to `true`.
385385

386386
`stop`: Specify a JSON array of stopping strings.
387387
These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`
@@ -446,7 +446,7 @@ These words will not be included in the completion, so make sure to add them to
446446

447447
**Response format**
448448

449-
- Note: When using streaming mode (`stream`), only `content` and `stop` will be returned until end of completion.
449+
- Note: In streaming mode (`stream`), only `content` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
450450

451451
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
452452

0 commit comments

Comments
 (0)