README.md 1.43 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Structured Outputs

This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server.
It can run individual constraint type or all of them.
It supports both streaming responses and concurrent non-streaming requests.

To use this example, you must start an vLLM server with any model of your choice.

```bash
vllm serve Qwen/Qwen2.5-3B-Instruct
```

To serve a reasoning model, you can use the following command:

```bash
Reid's avatar
Reid committed
16
17
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
    --reasoning-parser deepseek_r1
18
19
20
21
22
```

If you want to run this script standalone with `uv`, you can use the following:

```bash
Reid's avatar
Reid committed
23
24
uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs \
    structured-output
25
26
```

27
See [feature docs](https://docs.vllm.ai/en/latest/features/structured_outputs.html) for more information.
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

!!! tip
    If vLLM is running remotely, then set `OPENAI_BASE_URL=<remote_url>` before running the script.

## Usage

Run all constraints, non-streaming:

```bash
uv run structured_outputs.py
```

Run all constraints, streaming:

```bash
uv run structured_outputs.py --stream
```

Run certain constraints, for example `structural_tag` and `regex`, streaming:

```bash
Reid's avatar
Reid committed
49
50
51
uv run structured_outputs.py \
    --constraint structural_tag regex \
    --stream
52
53
54
55
56
57
58
```

Run all constraints, with reasoning models and streaming:

```bash
uv run structured_outputs.py --reasoning --stream
```