"...git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "a130cf331ef8b91197150e5a47a09e2b9487e61b"
Unverified Commit bae41d44 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

docs(agents): improve tool/reasoning parser docs (#8497)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent f53fa64c
---
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Chat Processor Options
subtitle: Choose the right preprocessing pipeline for tool calling, reasoning, and tokenization
---
Dynamo splits work between a **frontend** process (HTTP server, tokenization,
routing, parsing) and one or more **worker** processes (the engine running the
model). Several CLI flags control which code path handles chat template
rendering, tool-call parsing, and reasoning-content separation. This page
explains the available configurations, when to use each, and how they interact
with KV cache routing.
For the list of individual parser names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
## Configurations
There are five supported configurations. Each is set at startup -- Dynamo does
not switch between them per request.
| | Frontend flags | Worker flags | KV routing | Notes |
|---|---|---|---|---|
| **A** Dynamo-native (default) | `--dyn-chat-processor dynamo` | `--dyn-tool-call-parser <name>` `--dyn-reasoning-parser <name>` | Yes | Rust preprocessor. Lowest latency. |
| **B** vLLM chat processor | `--dyn-chat-processor vllm` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to vLLM's Python preprocessor. |
| **C** SGLang chat processor | `--dyn-chat-processor sglang` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to SGLang's Python preprocessor. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md). |
| **D** vLLM tokenizer delegation | `--router-mode round-robin` | `--use-vllm-tokenizer` | No | Engine-side tokenization. Day-0 model fallback. |
| **E** SGLang tokenizer delegation | `--router-mode round-robin` | `--use-sglang-tokenizer` | No | **Deprecated** -- use option C instead. |
> [!NOTE]
> Although `dynamo` is the default for `--dyn-chat-processor`, specifying it
> explicitly in launch scripts makes the choice visible in logs and support
> diagnostics.
## Flag reference
### `--dyn-chat-processor {dynamo | vllm | sglang}`
Frontend flag (default `dynamo`). Selects the chat processor that renders
templates, tokenizes, and dispatches parsing.
- `dynamo` -- Rust preprocessor. Parser names come from Dynamo's registry
(see [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md)).
- `vllm` -- vLLM's Python preprocessor. Parser names come from vLLM's
registry, which may differ from Dynamo's.
- `sglang` -- SGLang's Python preprocessor. Parser names come from SGLang's
registry. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md).
### `--dyn-tool-call-parser <name>` / `--dyn-reasoning-parser <name>`
Worker flags. Names from Dynamo's parser registry. Only effective under
`--dyn-chat-processor dynamo` (option A); silently ignored under other chat
processors.
The flags are declared on the worker CLI, but the parser runs on the frontend --
the name propagates via model metadata. For supported names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
### `--tool-call-parser <name>` / `--reasoning-parser <name>`
Frontend flags (no `--dyn-` prefix). Names from the upstream engine's registry.
Only accepted when paired with the matching chat processor:
- Under `--dyn-chat-processor vllm`: accepted. Use vLLM parser names.
- Under `--dyn-chat-processor sglang`: accepted. Use SGLang parser names.
- Under `--dyn-chat-processor dynamo`: **rejected at startup** with
`Unknown arguments specified: ...`. Use the `--dyn-*` worker flags instead.
Upstream parser names are pinned to the engine version shipped in the Dynamo
container. They may differ from Dynamo's names for the same model (e.g.,
SGLang uses `deepseekv3` where Dynamo uses `deepseek_v3`).
### `--use-vllm-tokenizer` / `--use-sglang-tokenizer`
Worker flags (boolean). Hand tokenization to the engine instead of the
frontend. The flag must match the engine on the worker.
`--use-sglang-tokenizer` is deprecated. New SGLang deployments should use
`--dyn-chat-processor sglang` (option C) instead. See
[Migration from --use-sglang-tokenizer](../backends/sglang/sglang-chat-processor.md#migration-from---use-sglang-tokenizer).
## Which option should I pick?
1. **Does Dynamo have a parser for your model?** Check the per-model tables in
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md). If yes, use
**option A**. This is the default path: Rust parsing on the frontend,
KV-routable, lowest latency.
2. **Does the upstream engine have a parser but Dynamo doesn't?** Use
**option B** (vLLM) or **option C** (SGLang). Still KV-routable.
3. **Is the tokenizer itself the problem** (day-0 model, custom special tokens,
rope variants)? Use **option D**. KV routing is off; pair with
`--router-mode round-robin`.
4. **SGLang + day-0 model?** Use **option C** with the appropriate upstream
parser name. Do not use option E (deprecated).
## Invalid and silently broken combinations
### Rejected at startup
- **`--dyn-chat-processor dynamo` with `--tool-call-parser <name>`** (or
`--reasoning-parser`). The un-prefixed flags are not recognized under the
Dynamo chat processor. Use `--dyn-tool-call-parser` on the worker instead.
- **`--tool-call-parser` and `--dyn-tool-call-parser` together** on the same
SGLang worker. SGLang rejects this: `Cannot use both --tool-call-parser and
--dyn-tool-call-parser`. Pick one namespace.
- **`--use-vllm-tokenizer` on an SGLang worker** (and vice versa). The flag
must match the engine.
### Silently broken (no startup error, wrong results)
- **Tokenizer delegation + `--router-mode kv`** -- Options D/E with `kv`
routing produces prefix-hash mismatches and silent cache misses.
- **`--dyn-tool-call-parser` + `--use-vllm-tokenizer`** on the same vLLM
worker. The worker bypasses Dynamo's preprocessor while the frontend-side
parser is still wired up, producing mismatched token streams. No
mutual-exclusivity check exists today.
## Routing compatibility
`--router-mode kv` needs frontend tokenization to compute prefix-hash routing
keys. Options A, B, and C keep the tokenizer on the frontend and are
KV-routable. Options D and E move tokenization to the worker and are **not**
KV-routable -- pair them with `round-robin` or `random`.
| Option | `kv` routing | `round-robin` / `random` |
|--------|:---:|:---:|
| A (Dynamo-native) | Yes | Yes |
| B (vLLM processor) | Yes | Yes |
| C (SGLang processor) | Yes | Yes |
| D (vLLM tokenizer delegation) | **No** | Yes |
| E (SGLang tokenizer delegation) | **No** | Yes |
## Why each flag exists
- **Frontend tokenization** is required for KV cache routing. The frontend
needs token IDs to compute prefix-hash routing keys before the request
reaches a worker. Parser flags on the Rust-native path (option A) co-locate
with tokenization on the frontend for this reason.
- **Backend tokenization** is a fallback for when frontend tokenization can't
or shouldn't run: unsupported model, day-0 support, tokenizer edge cases
(custom special tokens, rope variants). The engine owns the tokenizer in
this mode, so KV routing drops out.
- **Chat-processor swap** (options B/C) is the middle ground: tokenization
stays on the frontend (KV-routable), but parsing delegates to the upstream
engine's Python implementation. This covers models where Dynamo's Rust
parser hasn't been written yet.
## Parser names by model
For the full list of supported parser names, which models they cover, and
upstream name divergences (relevant for options B and C):
- [Tool Calling](tool-calling.md) -- supported tool call parsers with model
mappings and upstream name differences
- [Reasoning](reasoning.md) -- supported reasoning parsers with model mappings
and force-reasoning behavior
## Canonical launch examples
```bash
# A -- Dynamo-native (default).
python -m dynamo.vllm \
--dyn-tool-call-parser kimi_k2 \
--dyn-reasoning-parser kimi_k25
python -m dynamo.frontend --dyn-chat-processor dynamo
# B -- vLLM chat-processor (upstream parser names on the frontend).
python -m dynamo.vllm ...
python -m dynamo.frontend \
--dyn-chat-processor vllm \
--tool-call-parser hermes \
--reasoning-parser deepseek_r1
# C -- SGLang chat-processor.
python -m dynamo.sglang ...
python -m dynamo.frontend \
--dyn-chat-processor sglang \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k25
# D -- vLLM tokenizer delegation (no KV routing).
python -m dynamo.vllm --use-vllm-tokenizer ...
python -m dynamo.frontend --router-mode round-robin
```
## See Also
- [Tool Calling](tool-calling.md) -- Supported tool call parser names, request examples
- [Reasoning](reasoning.md) -- Supported reasoning parser names, common pairings
- [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md) -- Option C details
- [Frontend Configuration Reference](../components/frontend/configuration.md) -- Full CLI flag reference
...@@ -7,6 +7,12 @@ subtitle: Configure reasoning parsers for models that emit thinking content ...@@ -7,6 +7,12 @@ subtitle: Configure reasoning parsers for models that emit thinking content
Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker. Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
> [!TIP]
> This page covers parser names for the default Dynamo-native path. For a
> comparison of all preprocessing options (including vLLM/SGLang chat-processor
> swap and tokenizer delegation) and routing
> compatibility, see [Chat Processor Options](chat-processor-options.md).
## Prerequisites ## Prerequisites
To enable reasoning parsing, launch the backend worker with: To enable reasoning parsing, launch the backend worker with:
...@@ -23,23 +29,33 @@ python -m dynamo.<backend> --help ...@@ -23,23 +29,33 @@ python -m dynamo.<backend> --help
## Supported Reasoning Parsers ## Supported Reasoning Parsers
The reasoning parser names currently supported in the codebase are: The table below lists the currently supported reasoning parsers in Dynamo's registry. The
**Upstream name** column shows where the vLLM or SGLang parser name differs
| Parser Name | Typical Models / Format | from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
|-------------|-------------------------| (see [Chat Processor Options](chat-processor-options.md)). A blank upstream
| `basic` | Generic `<think>...</think>` reasoning blocks | column means the same name works everywhere. `Dynamo-only` means no upstream
| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses | parser exists for this format.
| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
| `gpt_oss` | `openai/gpt-oss-*` | Parsers marked **force-reasoning** emit reasoning content from token one
| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers | without requiring an explicit opening tag (`<think>`, etc.). All others
| `kimi` | Kimi models that emit `◁think▷...◁/think▷` | require the opening tag to be present in the model output.
| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended | | Parser Name | Models | Upstream name | Force-reasoning | Notes |
| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` | |---|---|---|---|---|
| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks | | `basic` | Generic CoT models | Dynamo-only | No | Plain `<think>...</think>` |
| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag | | `deepseek_r1` | DeepSeek R1, DeepSeek V3.1, DeepSeek V3.2 | | Yes | Pass explicitly for V3.1/V3.2 (no alias) |
| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses | | `glm45` | GLM-4.5, GLM-4.7 | Dynamo-only | No | Alias for `nemotron_deci`. `<think>...</think>` |
| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen | | `gpt_oss` | gpt-oss-20b / -120b | Dynamo-only | No | Harmony channel reasoning format |
| `granite` | Granite 3.x | | No | `Here's my thought process:` / `Here's my response:` |
| `kimi` | Kimi K2 Instruct / Thinking | Dynamo-only | No | `◁think▷...◁/think▷` |
| `kimi_k25` | Kimi K2.5 | Dynamo-only | Yes | `<think>...</think>` with force-reasoning |
| `minimax_append_think` | MiniMax M2 / M2.1 | Dynamo-only | No | Implicit opening `<think>` prepended |
| `mistral` | Magistral | | Yes | `[THINK]...[/THINK]` |
| `nemotron3` | Nemotron-3 / Mini | Dynamo-only | Yes | Alias for `deepseek_r1` |
| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron | Dynamo-only | No | `<think>...</think>` |
| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Yes | Alias for `deepseek_r1` |
| `qwen3` | QwQ-32B, Qwen3-Think, Qwen3-Coder | | No | `<think>...</think>` |
| `step3` | Step-3 / Step-3-Reasoning | Dynamo-only | Yes | `<think>...</think>` |
## Common Parser Pairings ## Common Parser Pairings
......
...@@ -10,6 +10,12 @@ to output function arguments for the relevant function(s) which you can execute ...@@ -10,6 +10,12 @@ to output function arguments for the relevant function(s) which you can execute
Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters. Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.
> [!TIP]
> This page covers parser names for the default Dynamo-native path. For a
> comparison of all preprocessing options (including vLLM/SGLang chat-processor
> swap and tokenizer delegation) and routing
> compatibility, see [Chat Processor Options](chat-processor-options.md).
## Prerequisites ## Prerequisites
To enable this feature, you should set the following flag while launching the backend worker To enable this feature, you should set the following flag while launching the backend worker
...@@ -33,27 +39,32 @@ python -m dynamo.<backend> --help ...@@ -33,27 +39,32 @@ python -m dynamo.<backend> --help
## Supported Tool Call Parsers ## Supported Tool Call Parsers
The tool call parser names currently supported in the codebase are: The table below lists the currently supported tool call parsers in Dynamo's registry. The
**Upstream name** column shows where the vLLM or SGLang parser name differs
| Parser Name | Typical Models / Format | from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
|-------------|-------------------------| (see [Chat Processor Options](chat-processor-options.md)). A blank upstream
| `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` | column means the same name works everywhere. `Dynamo-only` means no upstream
| `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` | parser exists for this format.
| `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<|DSML|function_calls>...`) |
| `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured | | Parser Name | Models | Upstream name | Notes |
| `glm47` | `zai-org/GLM-4.7` | |---|---|---|---|
| `harmony` | `openai/gpt-oss-*` | | `deepseek_v3` | DeepSeek V3, DeepSeek R1-0528+ | SGLang: `deepseekv3` | Special Unicode markers |
| `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` | | `deepseek_v3_1` | DeepSeek V3.1 | Dynamo-only | JSON separators |
| `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` | | `deepseek_v3_2` | DeepSeek V3.2+ | Dynamo-only | DSML tags (`<|DSML|function_calls>...`) |
| `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` | | `default` | *(fallback)* | Dynamo-only | Empty JSON config (no start/end tokens). Prefer a model-specific parser for production use. |
| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` | | `glm47` | GLM-4.5, GLM-4.7 | Dynamo-only | XML `<arg_key>/<arg_value>` |
| `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) | | `harmony` | gpt-oss-20b / -120b | Dynamo-only | Harmony channel format |
| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` | | `hermes` | Qwen2.5-\*, QwQ-32B, Qwen3-Instruct, Qwen3-Think, NousHermes-2/3 | vLLM: `qwen2_5`; SGLang: `qwen25` (for Qwen models) | `<tool_call>` JSON |
| `nemotron_deci` | `nvidia/nemotron-*` | | `jamba` | Jamba 1.5 / 1.6 / 1.7 | Dynamo-only | `<tool_calls>` JSON |
| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` | | `kimi_k2` | Kimi K2 Instruct/Thinking, Kimi K2.5 | | Pair with `--dyn-reasoning-parser kimi` or `kimi_k25` |
| `phi4` | `Phi-4-*` | | `llama3_json` | Llama 3 / 3.1 / 3.2 / 3.3 Instruct | | `<\|python_tag\|>` tool syntax |
| `pythonic` | `meta-llama/Llama-4-*` | | `minimax_m2` | MiniMax M2 / M2.1 | vLLM: `minimax` | XML `<minimax:tool_call>` |
| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` | | `mistral` | Mistral / Mixtral / Mistral-Nemo, Magistral | | `[TOOL_CALLS]...[/TOOL_CALLS]` |
| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron-Ultra / -Super | Dynamo-only | `<TOOLCALL>` JSON |
| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Alias for `qwen3_coder` |
| `phi4` | Phi-4, Phi-4-mini, Phi-4-mini-reasoning | vLLM: `phi4_mini_json` | `functools[...]` JSON |
| `pythonic` | Llama 4 (Scout / Maverick) | | Python-list tool syntax |
| `qwen3_coder` | Qwen3-Coder | | XML `<tool_call><function=...>` |
> [!TIP] > [!TIP]
> For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with > For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with
......
...@@ -133,6 +133,8 @@ navigation: ...@@ -133,6 +133,8 @@ navigation:
path: backends/sglang/sglang-diffusion.md path: backends/sglang/sglang-diffusion.md
- page: TRT-LLM Diffusion - page: TRT-LLM Diffusion
path: backends/trtllm/trtllm-video-diffusion.md path: backends/trtllm/trtllm-video-diffusion.md
- page: Chat Processor Options
path: agents/chat-processor-options.md
- page: Tool Calling - page: Tool Calling
path: agents/tool-calling.md path: agents/tool-calling.md
- page: Reasoning - page: Reasoning
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment