Skip to content

Commit 52b4f4a

Browse files
authored
[Docs] Update structured output doc for V1 (#17135)
Signed-off-by: Russell Bryant <[email protected]>
1 parent e782e0a commit 52b4f4a

File tree

1 file changed

+28
-13
lines changed

1 file changed

+28
-13
lines changed

docs/source/features/structured_outputs.md

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,11 @@
22

33
# Structured Outputs
44

5-
vLLM supports the generation of structured outputs using [outlines](https://github.com/dottxt-ai/outlines), [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer), or [xgrammar](https://github.com/mlc-ai/xgrammar) as backends for the guided decoding.
6-
This document shows you some examples of the different options that are available to generate structured outputs.
5+
vLLM supports the generation of structured outputs using
6+
[xgrammar](https://github.com/mlc-ai/xgrammar) or
7+
[guidance](https://github.com/guidance-ai/llguidance) as backends.
8+
This document shows you some examples of the different options that are
9+
available to generate structured outputs.
710

811
## Online Serving (OpenAI API)
912

@@ -15,10 +18,17 @@ The following parameters are supported, which must be added as extra parameters:
1518
- `guided_regex`: the output will follow the regex pattern.
1619
- `guided_json`: the output will follow the JSON schema.
1720
- `guided_grammar`: the output will follow the context free grammar.
18-
- `guided_whitespace_pattern`: used to override the default whitespace pattern for guided json decoding.
19-
- `guided_decoding_backend`: used to select the guided decoding backend to use. Additional backend-specific options can be supplied in a comma separated list following a colon after the backend name. For example `"xgrammar:no-fallback"` will not allow vLLM to fallback to a different backend on error.
21+
- `structural_tag`: Follow a JSON schema within a set of specified tags within the generated text.
2022

21-
You can see the complete list of supported parameters on the [OpenAI-Compatible Server](#openai-compatible-server)page.
23+
You can see the complete list of supported parameters on the [OpenAI-Compatible Server](#openai-compatible-server) page.
24+
25+
Structured outputs are supported by default in the OpenAI-Compatible Server. You
26+
may choose to specify the backend to use by setting the
27+
`--guided-decoding-backend` flag to `vllm serve`. The default backend is `auto`,
28+
which will try to choose an appropriate backend based on the details of the
29+
request. You may also choose a specific backend, along with
30+
some options. A full set of options is available in the `vllm serve --help`
31+
text.
2232

2333
Now let´s see an example for each of the cases, starting with the `guided_choice`, as it´s the easiest one:
2434

@@ -96,12 +106,15 @@ print(completion.choices[0].message.content)
96106
```
97107

98108
:::{tip}
99-
While not strictly necessary, normally it´s better to indicate in the prompt that a JSON needs to be generated and which fields and how should the LLM fill them.
100-
This can improve the results notably in most cases.
109+
While not strictly necessary, normally it´s better to indicate in the prompt the
110+
JSON schema and how the fields should be populated. This can improve the
111+
results notably in most cases.
101112
:::
102113

103-
Finally we have the `guided_grammar`, which probably is the most difficult one to use but it´s really powerful, as it allows us to define complete languages like SQL queries.
104-
It works by using a context free EBNF grammar, which for example we can use to define a specific format of simplified SQL queries, like in the example below:
114+
Finally we have the `guided_grammar` option, which is probably the most
115+
difficult to use, but it´s really powerful. It allows us to define complete
116+
languages like SQL queries. It works by using a context free EBNF grammar.
117+
As an example, we can use to define a specific format of simplified SQL queries:
105118

106119
```python
107120
simplified_sql_grammar = """
@@ -226,6 +239,8 @@ Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equa
226239
Answer: x = -29/8
227240
```
228241

242+
An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs_structural_tag.py>
243+
229244
## Offline Inference
230245

231246
Offline inference allows for the same types of guided decoding.
@@ -236,11 +251,11 @@ The main available options inside `GuidedDecodingParams` are:
236251
- `regex`
237252
- `choice`
238253
- `grammar`
239-
- `backend`
240-
- `whitespace_pattern`
254+
- `structural_tag`
241255

242-
These parameters can be used in the same way as the parameters from the Online Serving examples above.
243-
One example for the usage of the `choices` parameter is shown below:
256+
These parameters can be used in the same way as the parameters from the Online
257+
Serving examples above. One example for the usage of the `choice` parameter is
258+
shown below:
244259

245260
```python
246261
from vllm import LLM, SamplingParams

0 commit comments

Comments
 (0)