"deploy/cloud/vscode:/vscode.git/clone" did not exist on "197e0227a72ac28cee9ae60d90237fb9453f43ce"
Unverified Commit bae41d44 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

docs(agents): improve tool/reasoning parser docs (#8497)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent f53fa64c
---
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Chat Processor Options
subtitle: Choose the right preprocessing pipeline for tool calling, reasoning, and tokenization
---
Dynamo splits work between a **frontend** process (HTTP server, tokenization,
routing, parsing) and one or more **worker** processes (the engine running the
model). Several CLI flags control which code path handles chat template
rendering, tool-call parsing, and reasoning-content separation. This page
explains the available configurations, when to use each, and how they interact
with KV cache routing.
For the list of individual parser names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
## Configurations
There are five supported configurations. Each is set at startup -- Dynamo does
not switch between them per request.
| | Frontend flags | Worker flags | KV routing | Notes |
|---|---|---|---|---|
| **A** Dynamo-native (default) | `--dyn-chat-processor dynamo` | `--dyn-tool-call-parser <name>` `--dyn-reasoning-parser <name>` | Yes | Rust preprocessor. Lowest latency. |
| **B** vLLM chat processor | `--dyn-chat-processor vllm` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to vLLM's Python preprocessor. |
| **C** SGLang chat processor | `--dyn-chat-processor sglang` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to SGLang's Python preprocessor. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md). |
| **D** vLLM tokenizer delegation | `--router-mode round-robin` | `--use-vllm-tokenizer` | No | Engine-side tokenization. Day-0 model fallback. |
| **E** SGLang tokenizer delegation | `--router-mode round-robin` | `--use-sglang-tokenizer` | No | **Deprecated** -- use option C instead. |
> [!NOTE]
> Although `dynamo` is the default for `--dyn-chat-processor`, specifying it
> explicitly in launch scripts makes the choice visible in logs and support
> diagnostics.
## Flag reference
### `--dyn-chat-processor {dynamo | vllm | sglang}`
Frontend flag (default `dynamo`). Selects the chat processor that renders
templates, tokenizes, and dispatches parsing.
- `dynamo` -- Rust preprocessor. Parser names come from Dynamo's registry
(see [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md)).
- `vllm` -- vLLM's Python preprocessor. Parser names come from vLLM's
registry, which may differ from Dynamo's.
- `sglang` -- SGLang's Python preprocessor. Parser names come from SGLang's
registry. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md).
### `--dyn-tool-call-parser <name>` / `--dyn-reasoning-parser <name>`
Worker flags. Names from Dynamo's parser registry. Only effective under
`--dyn-chat-processor dynamo` (option A); silently ignored under other chat
processors.
The flags are declared on the worker CLI, but the parser runs on the frontend --
the name propagates via model metadata. For supported names, see
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
### `--tool-call-parser <name>` / `--reasoning-parser <name>`
Frontend flags (no `--dyn-` prefix). Names from the upstream engine's registry.
Only accepted when paired with the matching chat processor:
- Under `--dyn-chat-processor vllm`: accepted. Use vLLM parser names.
- Under `--dyn-chat-processor sglang`: accepted. Use SGLang parser names.
- Under `--dyn-chat-processor dynamo`: **rejected at startup** with
`Unknown arguments specified: ...`. Use the `--dyn-*` worker flags instead.
Upstream parser names are pinned to the engine version shipped in the Dynamo
container. They may differ from Dynamo's names for the same model (e.g.,
SGLang uses `deepseekv3` where Dynamo uses `deepseek_v3`).
### `--use-vllm-tokenizer` / `--use-sglang-tokenizer`
Worker flags (boolean). Hand tokenization to the engine instead of the
frontend. The flag must match the engine on the worker.
`--use-sglang-tokenizer` is deprecated. New SGLang deployments should use
`--dyn-chat-processor sglang` (option C) instead. See
[Migration from --use-sglang-tokenizer](../backends/sglang/sglang-chat-processor.md#migration-from---use-sglang-tokenizer).
## Which option should I pick?
1. **Does Dynamo have a parser for your model?** Check the per-model tables in
[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md). If yes, use
**option A**. This is the default path: Rust parsing on the frontend,
KV-routable, lowest latency.
2. **Does the upstream engine have a parser but Dynamo doesn't?** Use
**option B** (vLLM) or **option C** (SGLang). Still KV-routable.
3. **Is the tokenizer itself the problem** (day-0 model, custom special tokens,
rope variants)? Use **option D**. KV routing is off; pair with
`--router-mode round-robin`.
4. **SGLang + day-0 model?** Use **option C** with the appropriate upstream
parser name. Do not use option E (deprecated).
## Invalid and silently broken combinations
### Rejected at startup
- **`--dyn-chat-processor dynamo` with `--tool-call-parser <name>`** (or
`--reasoning-parser`). The un-prefixed flags are not recognized under the
Dynamo chat processor. Use `--dyn-tool-call-parser` on the worker instead.
- **`--tool-call-parser` and `--dyn-tool-call-parser` together** on the same
SGLang worker. SGLang rejects this: `Cannot use both --tool-call-parser and
--dyn-tool-call-parser`. Pick one namespace.
- **`--use-vllm-tokenizer` on an SGLang worker** (and vice versa). The flag
must match the engine.
### Silently broken (no startup error, wrong results)
- **Tokenizer delegation + `--router-mode kv`** -- Options D/E with `kv`
routing produces prefix-hash mismatches and silent cache misses.
- **`--dyn-tool-call-parser` + `--use-vllm-tokenizer`** on the same vLLM
worker. The worker bypasses Dynamo's preprocessor while the frontend-side
parser is still wired up, producing mismatched token streams. No
mutual-exclusivity check exists today.
## Routing compatibility
`--router-mode kv` needs frontend tokenization to compute prefix-hash routing
keys. Options A, B, and C keep the tokenizer on the frontend and are
KV-routable. Options D and E move tokenization to the worker and are **not**
KV-routable -- pair them with `round-robin` or `random`.
| Option | `kv` routing | `round-robin` / `random` |
|--------|:---:|:---:|
| A (Dynamo-native) | Yes | Yes |
| B (vLLM processor) | Yes | Yes |
| C (SGLang processor) | Yes | Yes |
| D (vLLM tokenizer delegation) | **No** | Yes |
| E (SGLang tokenizer delegation) | **No** | Yes |
## Why each flag exists
- **Frontend tokenization** is required for KV cache routing. The frontend
needs token IDs to compute prefix-hash routing keys before the request
reaches a worker. Parser flags on the Rust-native path (option A) co-locate
with tokenization on the frontend for this reason.
- **Backend tokenization** is a fallback for when frontend tokenization can't
or shouldn't run: unsupported model, day-0 support, tokenizer edge cases
(custom special tokens, rope variants). The engine owns the tokenizer in
this mode, so KV routing drops out.
- **Chat-processor swap** (options B/C) is the middle ground: tokenization
stays on the frontend (KV-routable), but parsing delegates to the upstream
engine's Python implementation. This covers models where Dynamo's Rust
parser hasn't been written yet.
## Parser names by model
For the full list of supported parser names, which models they cover, and
upstream name divergences (relevant for options B and C):
- [Tool Calling](tool-calling.md) -- supported tool call parsers with model
mappings and upstream name differences
- [Reasoning](reasoning.md) -- supported reasoning parsers with model mappings
and force-reasoning behavior
## Canonical launch examples
```bash
# A -- Dynamo-native (default).
python -m dynamo.vllm \
--dyn-tool-call-parser kimi_k2 \
--dyn-reasoning-parser kimi_k25
python -m dynamo.frontend --dyn-chat-processor dynamo
# B -- vLLM chat-processor (upstream parser names on the frontend).
python -m dynamo.vllm ...
python -m dynamo.frontend \
--dyn-chat-processor vllm \
--tool-call-parser hermes \
--reasoning-parser deepseek_r1
# C -- SGLang chat-processor.
python -m dynamo.sglang ...
python -m dynamo.frontend \
--dyn-chat-processor sglang \
--tool-call-parser kimi_k2 \
--reasoning-parser kimi_k25
# D -- vLLM tokenizer delegation (no KV routing).
python -m dynamo.vllm --use-vllm-tokenizer ...
python -m dynamo.frontend --router-mode round-robin
```
## See Also
- [Tool Calling](tool-calling.md) -- Supported tool call parser names, request examples
- [Reasoning](reasoning.md) -- Supported reasoning parser names, common pairings
- [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md) -- Option C details
- [Frontend Configuration Reference](../components/frontend/configuration.md) -- Full CLI flag reference
......@@ -7,6 +7,12 @@ subtitle: Configure reasoning parsers for models that emit thinking content
Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
> [!TIP]
> This page covers parser names for the default Dynamo-native path. For a
> comparison of all preprocessing options (including vLLM/SGLang chat-processor
> swap and tokenizer delegation) and routing
> compatibility, see [Chat Processor Options](chat-processor-options.md).
## Prerequisites
To enable reasoning parsing, launch the backend worker with:
......@@ -23,23 +29,33 @@ python -m dynamo.<backend> --help
## Supported Reasoning Parsers
The reasoning parser names currently supported in the codebase are:
| Parser Name | Typical Models / Format |
|-------------|-------------------------|
| `basic` | Generic `<think>...</think>` reasoning blocks |
| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses |
| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
| `gpt_oss` | `openai/gpt-oss-*` |
| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers |
| `kimi` | Kimi models that emit `◁think▷...◁/think▷` |
| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended |
| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` |
| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks |
| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag |
| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses |
| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen |
The table below lists the currently supported reasoning parsers in Dynamo's registry. The
**Upstream name** column shows where the vLLM or SGLang parser name differs
from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
(see [Chat Processor Options](chat-processor-options.md)). A blank upstream
column means the same name works everywhere. `Dynamo-only` means no upstream
parser exists for this format.
Parsers marked **force-reasoning** emit reasoning content from token one
without requiring an explicit opening tag (`<think>`, etc.). All others
require the opening tag to be present in the model output.
| Parser Name | Models | Upstream name | Force-reasoning | Notes |
|---|---|---|---|---|
| `basic` | Generic CoT models | Dynamo-only | No | Plain `<think>...</think>` |
| `deepseek_r1` | DeepSeek R1, DeepSeek V3.1, DeepSeek V3.2 | | Yes | Pass explicitly for V3.1/V3.2 (no alias) |
| `glm45` | GLM-4.5, GLM-4.7 | Dynamo-only | No | Alias for `nemotron_deci`. `<think>...</think>` |
| `gpt_oss` | gpt-oss-20b / -120b | Dynamo-only | No | Harmony channel reasoning format |
| `granite` | Granite 3.x | | No | `Here's my thought process:` / `Here's my response:` |
| `kimi` | Kimi K2 Instruct / Thinking | Dynamo-only | No | `◁think▷...◁/think▷` |
| `kimi_k25` | Kimi K2.5 | Dynamo-only | Yes | `<think>...</think>` with force-reasoning |
| `minimax_append_think` | MiniMax M2 / M2.1 | Dynamo-only | No | Implicit opening `<think>` prepended |
| `mistral` | Magistral | | Yes | `[THINK]...[/THINK]` |
| `nemotron3` | Nemotron-3 / Mini | Dynamo-only | Yes | Alias for `deepseek_r1` |
| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron | Dynamo-only | No | `<think>...</think>` |
| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Yes | Alias for `deepseek_r1` |
| `qwen3` | QwQ-32B, Qwen3-Think, Qwen3-Coder | | No | `<think>...</think>` |
| `step3` | Step-3 / Step-3-Reasoning | Dynamo-only | Yes | `<think>...</think>` |
## Common Parser Pairings
......
......@@ -10,6 +10,12 @@ to output function arguments for the relevant function(s) which you can execute
Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.
> [!TIP]
> This page covers parser names for the default Dynamo-native path. For a
> comparison of all preprocessing options (including vLLM/SGLang chat-processor
> swap and tokenizer delegation) and routing
> compatibility, see [Chat Processor Options](chat-processor-options.md).
## Prerequisites
To enable this feature, you should set the following flag while launching the backend worker
......@@ -33,27 +39,32 @@ python -m dynamo.<backend> --help
## Supported Tool Call Parsers
The tool call parser names currently supported in the codebase are:
| Parser Name | Typical Models / Format |
|-------------|-------------------------|
| `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` |
| `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` |
| `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<|DSML|function_calls>...`) |
| `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured |
| `glm47` | `zai-org/GLM-4.7` |
| `harmony` | `openai/gpt-oss-*` |
| `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` |
| `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` |
| `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` |
| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` |
| `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) |
| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` |
| `nemotron_deci` | `nvidia/nemotron-*` |
| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` |
| `phi4` | `Phi-4-*` |
| `pythonic` | `meta-llama/Llama-4-*` |
| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` |
The table below lists the currently supported tool call parsers in Dynamo's registry. The
**Upstream name** column shows where the vLLM or SGLang parser name differs
from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
(see [Chat Processor Options](chat-processor-options.md)). A blank upstream
column means the same name works everywhere. `Dynamo-only` means no upstream
parser exists for this format.
| Parser Name | Models | Upstream name | Notes |
|---|---|---|---|
| `deepseek_v3` | DeepSeek V3, DeepSeek R1-0528+ | SGLang: `deepseekv3` | Special Unicode markers |
| `deepseek_v3_1` | DeepSeek V3.1 | Dynamo-only | JSON separators |
| `deepseek_v3_2` | DeepSeek V3.2+ | Dynamo-only | DSML tags (`<|DSML|function_calls>...`) |
| `default` | *(fallback)* | Dynamo-only | Empty JSON config (no start/end tokens). Prefer a model-specific parser for production use. |
| `glm47` | GLM-4.5, GLM-4.7 | Dynamo-only | XML `<arg_key>/<arg_value>` |
| `harmony` | gpt-oss-20b / -120b | Dynamo-only | Harmony channel format |
| `hermes` | Qwen2.5-\*, QwQ-32B, Qwen3-Instruct, Qwen3-Think, NousHermes-2/3 | vLLM: `qwen2_5`; SGLang: `qwen25` (for Qwen models) | `<tool_call>` JSON |
| `jamba` | Jamba 1.5 / 1.6 / 1.7 | Dynamo-only | `<tool_calls>` JSON |
| `kimi_k2` | Kimi K2 Instruct/Thinking, Kimi K2.5 | | Pair with `--dyn-reasoning-parser kimi` or `kimi_k25` |
| `llama3_json` | Llama 3 / 3.1 / 3.2 / 3.3 Instruct | | `<\|python_tag\|>` tool syntax |
| `minimax_m2` | MiniMax M2 / M2.1 | vLLM: `minimax` | XML `<minimax:tool_call>` |
| `mistral` | Mistral / Mixtral / Mistral-Nemo, Magistral | | `[TOOL_CALLS]...[/TOOL_CALLS]` |
| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron-Ultra / -Super | Dynamo-only | `<TOOLCALL>` JSON |
| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Alias for `qwen3_coder` |
| `phi4` | Phi-4, Phi-4-mini, Phi-4-mini-reasoning | vLLM: `phi4_mini_json` | `functools[...]` JSON |
| `pythonic` | Llama 4 (Scout / Maverick) | | Python-list tool syntax |
| `qwen3_coder` | Qwen3-Coder | | XML `<tool_call><function=...>` |
> [!TIP]
> For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with
......
......@@ -133,6 +133,8 @@ navigation:
path: backends/sglang/sglang-diffusion.md
- page: TRT-LLM Diffusion
path: backends/trtllm/trtllm-video-diffusion.md
- page: Chat Processor Options
path: agents/chat-processor-options.md
- page: Tool Calling
path: agents/tool-calling.md
- page: Reasoning
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment