You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/features/structured_outputs.md
+28-13Lines changed: 28 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -2,8 +2,11 @@
2
2
3
3
# Structured Outputs
4
4
5
-
vLLM supports the generation of structured outputs using [outlines](https://github.com/dottxt-ai/outlines), [lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer), or [xgrammar](https://github.com/mlc-ai/xgrammar) as backends for the guided decoding.
6
-
This document shows you some examples of the different options that are available to generate structured outputs.
5
+
vLLM supports the generation of structured outputs using
6
+
[xgrammar](https://github.com/mlc-ai/xgrammar) or
7
+
[guidance](https://github.com/guidance-ai/llguidance) as backends.
8
+
This document shows you some examples of the different options that are
9
+
available to generate structured outputs.
7
10
8
11
## Online Serving (OpenAI API)
9
12
@@ -15,10 +18,17 @@ The following parameters are supported, which must be added as extra parameters:
15
18
-`guided_regex`: the output will follow the regex pattern.
16
19
-`guided_json`: the output will follow the JSON schema.
17
20
-`guided_grammar`: the output will follow the context free grammar.
18
-
-`guided_whitespace_pattern`: used to override the default whitespace pattern for guided json decoding.
19
-
-`guided_decoding_backend`: used to select the guided decoding backend to use. Additional backend-specific options can be supplied in a comma separated list following a colon after the backend name. For example `"xgrammar:no-fallback"` will not allow vLLM to fallback to a different backend on error.
21
+
-`structural_tag`: Follow a JSON schema within a set of specified tags within the generated text.
20
22
21
-
You can see the complete list of supported parameters on the [OpenAI-Compatible Server](#openai-compatible-server)page.
23
+
You can see the complete list of supported parameters on the [OpenAI-Compatible Server](#openai-compatible-server) page.
24
+
25
+
Structured outputs are supported by default in the OpenAI-Compatible Server. You
26
+
may choose to specify the backend to use by setting the
27
+
`--guided-decoding-backend` flag to `vllm serve`. The default backend is `auto`,
28
+
which will try to choose an appropriate backend based on the details of the
29
+
request. You may also choose a specific backend, along with
30
+
some options. A full set of options is available in the `vllm serve --help`
31
+
text.
22
32
23
33
Now let´s see an example for each of the cases, starting with the `guided_choice`, as it´s the easiest one:
While not strictly necessary, normally it´s better to indicate in the prompt that a JSON needs to be generated and which fields and how should the LLM fill them.
100
-
This can improve the results notably in most cases.
109
+
While not strictly necessary, normally it´s better to indicate in the prompt the
110
+
JSON schema and how the fields should be populated. This can improve the
111
+
results notably in most cases.
101
112
:::
102
113
103
-
Finally we have the `guided_grammar`, which probably is the most difficult one to use but it´s really powerful, as it allows us to define complete languages like SQL queries.
104
-
It works by using a context free EBNF grammar, which for example we can use to define a specific format of simplified SQL queries, like in the example below:
114
+
Finally we have the `guided_grammar` option, which is probably the most
115
+
difficult to use, but it´s really powerful. It allows us to define complete
116
+
languages like SQL queries. It works by using a context free EBNF grammar.
117
+
As an example, we can use to define a specific format of simplified SQL queries:
105
118
106
119
```python
107
120
simplified_sql_grammar ="""
@@ -226,6 +239,8 @@ Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equa
226
239
Answer: x = -29/8
227
240
```
228
241
242
+
An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs_structural_tag.py>
243
+
229
244
## Offline Inference
230
245
231
246
Offline inference allows for the same types of guided decoding.
@@ -236,11 +251,11 @@ The main available options inside `GuidedDecodingParams` are:
236
251
-`regex`
237
252
-`choice`
238
253
-`grammar`
239
-
-`backend`
240
-
-`whitespace_pattern`
254
+
-`structural_tag`
241
255
242
-
These parameters can be used in the same way as the parameters from the Online Serving examples above.
243
-
One example for the usage of the `choices` parameter is shown below:
256
+
These parameters can be used in the same way as the parameters from the Online
257
+
Serving examples above. One example for the usage of the `choice` parameter is
0 commit comments