README.md 1.4 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Structured Outputs

This script demonstrates various structured output capabilities of vLLM's OpenAI-compatible server.
It can run individual constraint type or all of them.
It supports both streaming responses and concurrent non-streaming requests.

To use this example, you must start an vLLM server with any model of your choice.

```bash
vllm serve Qwen/Qwen2.5-3B-Instruct
```

To serve a reasoning model, you can use the following command:

```bash
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
```

If you want to run this script standalone with `uv`, you can use the following:

```bash
uvx --from git+https://github.com/vllm-project/vllm#subdirectory=examples/online_serving/structured_outputs structured-output
```

25
See [feature docs](https://docs.vllm.ai/en/latest/features/structured_outputs.html) for more information.
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

!!! tip
    If vLLM is running remotely, then set `OPENAI_BASE_URL=<remote_url>` before running the script.

## Usage

Run all constraints, non-streaming:

```bash
uv run structured_outputs.py
```

Run all constraints, streaming:

```bash
uv run structured_outputs.py --stream
```

Run certain constraints, for example `structural_tag` and `regex`, streaming:

```bash
uv run structured_outputs.py --constraint structural_tag regex --stream
```

Run all constraints, with reasoning models and streaming:

```bash
uv run structured_outputs.py --reasoning --stream
```