chat-processor-options.md 8.82 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Chat Processor Options
subtitle: Choose the right preprocessing pipeline for tool calling, reasoning, and tokenization
---

Dynamo splits work between a **frontend** process (HTTP server, tokenization,
routing, parsing) and one or more **worker** processes (the engine running the
model). Several CLI flags control which code path handles chat template
rendering, tool-call parsing, and reasoning-content separation. This page
explains the available configurations, when to use each, and how they interact
with KV cache routing.

For the list of individual parser names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).

## Configurations

There are five supported configurations. Each is set at startup -- Dynamo does
not switch between them per request.

| | Frontend flags | Worker flags | KV routing | Notes |
|---|---|---|---|---|
| **A** Dynamo-native (default) | `--dyn-chat-processor dynamo` | `--dyn-tool-call-parser <name>` `--dyn-reasoning-parser <name>` | Yes | Rust preprocessor. Lowest latency. |
| **B** vLLM chat processor | `--dyn-chat-processor vllm` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to vLLM's Python preprocessor. |
| **C** SGLang chat processor | `--dyn-chat-processor sglang` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to SGLang's Python preprocessor. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md). |
| **D** vLLM tokenizer delegation | `--router-mode round-robin` | `--use-vllm-tokenizer` | No | Engine-side tokenization. Day-0 model fallback. |
| **E** SGLang tokenizer delegation | `--router-mode round-robin` | `--use-sglang-tokenizer` | No | **Deprecated** -- use option C instead. |

> [!NOTE]
> Although `dynamo` is the default for `--dyn-chat-processor`, specifying it
> explicitly in launch scripts makes the choice visible in logs and support
> diagnostics.

## Flag reference

### `--dyn-chat-processor {dynamo | vllm | sglang}`

Frontend flag (default `dynamo`). Selects the chat processor that renders
templates, tokenizes, and dispatches parsing.

- `dynamo` -- Rust preprocessor. Parser names come from Dynamo's registry
  (see [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md)).
- `vllm` -- vLLM's Python preprocessor. Parser names come from vLLM's
  registry, which may differ from Dynamo's.
- `sglang` -- SGLang's Python preprocessor. Parser names come from SGLang's
  registry. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md).

### `--dyn-tool-call-parser <name>` / `--dyn-reasoning-parser <name>`

Worker flags. Names from Dynamo's parser registry. Only effective under
`--dyn-chat-processor dynamo` (option A); silently ignored under other chat
processors.

The flags are declared on the worker CLI, but the parser runs on the frontend --
the name propagates via model metadata. For supported names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).

### `--tool-call-parser <name>` / `--reasoning-parser <name>`

Frontend flags (no `--dyn-` prefix). Names from the upstream engine's registry.
Only accepted when paired with the matching chat processor:

- Under `--dyn-chat-processor vllm`: accepted. Use vLLM parser names.
- Under `--dyn-chat-processor sglang`: accepted. Use SGLang parser names.
- Under `--dyn-chat-processor dynamo`: **rejected at startup** with
  `Unknown arguments specified: ...`. Use the `--dyn-*` worker flags instead.

Upstream parser names are pinned to the engine version shipped in the Dynamo
container. They may differ from Dynamo's names for the same model (e.g.,
SGLang uses `deepseekv3` where Dynamo uses `deepseek_v3`).

### `--use-vllm-tokenizer` / `--use-sglang-tokenizer`

Worker flags (boolean). Hand tokenization to the engine instead of the
frontend. The flag must match the engine on the worker.

`--use-sglang-tokenizer` is deprecated. New SGLang deployments should use
`--dyn-chat-processor sglang` (option C) instead. See
[Migration from --use-sglang-tokenizer](../backends/sglang/sglang-chat-processor.md#migration-from---use-sglang-tokenizer).

## Which option should I pick?

1. **Does Dynamo have a parser for your model?** Check the per-model tables in
   [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md). If yes, use
   **option A**. This is the default path: Rust parsing on the frontend,
   KV-routable, lowest latency.

2. **Does the upstream engine have a parser but Dynamo doesn't?** Use
   **option B** (vLLM) or **option C** (SGLang). Still KV-routable.

3. **Is the tokenizer itself the problem** (day-0 model, custom special tokens,
   rope variants)? Use **option D**. KV routing is off; pair with
   `--router-mode round-robin`.

4. **SGLang + day-0 model?** Use **option C** with the appropriate upstream
   parser name. Do not use option E (deprecated).

## Invalid and silently broken combinations

### Rejected at startup

- **`--dyn-chat-processor dynamo` with `--tool-call-parser <name>`** (or
  `--reasoning-parser`). The un-prefixed flags are not recognized under the
  Dynamo chat processor. Use `--dyn-tool-call-parser` on the worker instead.

- **`--tool-call-parser` and `--dyn-tool-call-parser` together** on the same
  SGLang worker. SGLang rejects this: `Cannot use both --tool-call-parser and
  --dyn-tool-call-parser`. Pick one namespace.

- **`--use-vllm-tokenizer` on an SGLang worker** (and vice versa). The flag
  must match the engine.

### Silently broken (no startup error, wrong results)

- **Tokenizer delegation + `--router-mode kv`** -- Options D/E with `kv`
  routing produces prefix-hash mismatches and silent cache misses.

- **`--dyn-tool-call-parser` + `--use-vllm-tokenizer`** on the same vLLM
  worker. The worker bypasses Dynamo's preprocessor while the frontend-side
  parser is still wired up, producing mismatched token streams. No
  mutual-exclusivity check exists today.

## Routing compatibility

`--router-mode kv` needs frontend tokenization to compute prefix-hash routing
keys. Options A, B, and C keep the tokenizer on the frontend and are
KV-routable. Options D and E move tokenization to the worker and are **not**
KV-routable -- pair them with `round-robin` or `random`.

| Option | `kv` routing | `round-robin` / `random` |
|--------|:---:|:---:|
| A (Dynamo-native) | Yes | Yes |
| B (vLLM processor) | Yes | Yes |
| C (SGLang processor) | Yes | Yes |
| D (vLLM tokenizer delegation) | **No** | Yes |
| E (SGLang tokenizer delegation) | **No** | Yes |

## Why each flag exists

- **Frontend tokenization** is required for KV cache routing. The frontend
  needs token IDs to compute prefix-hash routing keys before the request
  reaches a worker. Parser flags on the Rust-native path (option A) co-locate
  with tokenization on the frontend for this reason.

- **Backend tokenization** is a fallback for when frontend tokenization can't
  or shouldn't run: unsupported model, day-0 support, tokenizer edge cases
  (custom special tokens, rope variants). The engine owns the tokenizer in
  this mode, so KV routing drops out.

- **Chat-processor swap** (options B/C) is the middle ground: tokenization
  stays on the frontend (KV-routable), but parsing delegates to the upstream
  engine's Python implementation. This covers models where Dynamo's Rust
  parser hasn't been written yet.

## Parser names by model

For the full list of supported parser names, which models they cover, and
upstream name divergences (relevant for options B and C):

- [Tool Calling](tool-calling.md) -- supported tool call parsers with model
  mappings and upstream name differences
- [Reasoning](reasoning.md) -- supported reasoning parsers with model mappings
  and force-reasoning behavior

## Canonical launch examples

```bash
# A -- Dynamo-native (default).
python -m dynamo.vllm \
  --dyn-tool-call-parser kimi_k2 \
  --dyn-reasoning-parser kimi_k25
python -m dynamo.frontend --dyn-chat-processor dynamo

# B -- vLLM chat-processor (upstream parser names on the frontend).
python -m dynamo.vllm ...
python -m dynamo.frontend \
  --dyn-chat-processor vllm \
  --tool-call-parser hermes \
  --reasoning-parser deepseek_r1

# C -- SGLang chat-processor.
python -m dynamo.sglang ...
python -m dynamo.frontend \
  --dyn-chat-processor sglang \
  --tool-call-parser kimi_k2 \
  --reasoning-parser kimi_k25

# D -- vLLM tokenizer delegation (no KV routing).
python -m dynamo.vllm --use-vllm-tokenizer ...
python -m dynamo.frontend --router-mode round-robin
```

## See Also

- [Tool Calling](tool-calling.md) -- Supported tool call parser names, request examples
- [Reasoning](reasoning.md) -- Supported reasoning parser names, common pairings
- [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md) -- Option C details
- [Frontend Configuration Reference](../components/frontend/configuration.md) -- Full CLI flag reference