docs(agents): improve tool/reasoning parser docs (#8497)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

docs(agents): improve tool/reasoning parser docs (#8497)
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
bae41d44 · Keiven C · GitHub · f53fa64c · bae41d44 · bae41d44
Unverified Commit bae41d44 authored Apr 22, 2026 by Keiven C Committed by GitHub Apr 22, 2026
4 changed files
--- a/docs/agents/chat-processor-options.md
+++ b/docs/agents/chat-processor-options.md
+---
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+title: Chat Processor Options
+subtitle: Choose the right preprocessing pipeline for tool calling, reasoning, and tokenization
+---
+Dynamo splits work between a **frontend** process (HTTP server, tokenization,
+routing, parsing) and one or more **worker** processes (the engine running the
+model). Several CLI flags control which code path handles chat template
+rendering, tool-call parsing, and reasoning-content separation. This page
+explains the available configurations, when to use each, and how they interact
+with KV cache routing.
+For the list of individual parser names, see
+[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
+## Configurations
+There are five supported configurations. Each is set at startup -- Dynamo does
+not switch between them per request.
+| | Frontend flags | Worker flags | KV routing | Notes |
+|---|---|---|---|---|
+| **A** Dynamo-native (default) | `--dyn-chat-processor dynamo` | `--dyn-tool-call-parser <name>` `--dyn-reasoning-parser <name>` | Yes | Rust preprocessor. Lowest latency. |
+| **B** vLLM chat processor | `--dyn-chat-processor vllm` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to vLLM's Python preprocessor. |
+| **C** SGLang chat processor | `--dyn-chat-processor sglang` `--tool-call-parser <name>` `--reasoning-parser <name>` | *(none)* | Yes | Delegates to SGLang's Python preprocessor. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md). |
+| **D** vLLM tokenizer delegation | `--router-mode round-robin` | `--use-vllm-tokenizer` | No | Engine-side tokenization. Day-0 model fallback. |
+| **E** SGLang tokenizer delegation | `--router-mode round-robin` | `--use-sglang-tokenizer` | No | **Deprecated** -- use option C instead. |
+> [!NOTE]
+> Although `dynamo` is the default for `--dyn-chat-processor`, specifying it
+> explicitly in launch scripts makes the choice visible in logs and support
+> diagnostics.
+## Flag reference
+### `--dyn-chat-processor {dynamo | vllm | sglang}`
+Frontend flag (default `dynamo`). Selects the chat processor that renders
+templates, tokenizes, and dispatches parsing.
+- `dynamo` -- Rust preprocessor. Parser names come from Dynamo's registry
+  (see [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md)).
+- `vllm` -- vLLM's Python preprocessor. Parser names come from vLLM's
+  registry, which may differ from Dynamo's.
+- `sglang` -- SGLang's Python preprocessor. Parser names come from SGLang's
+  registry. See [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md).
+### `--dyn-tool-call-parser <name>` / `--dyn-reasoning-parser <name>`
+Worker flags. Names from Dynamo's parser registry. Only effective under
+`--dyn-chat-processor dynamo` (option A); silently ignored under other chat
+processors.
+The flags are declared on the worker CLI, but the parser runs on the frontend --
+the name propagates via model metadata. For supported names, see
+[Tool Calling](tool-calling.md) and [Reasoning](reasoning.md).
+### `--tool-call-parser <name>` / `--reasoning-parser <name>`
+Frontend flags (no `--dyn-` prefix). Names from the upstream engine's registry.
+Only accepted when paired with the matching chat processor:
+- Under `--dyn-chat-processor vllm`: accepted. Use vLLM parser names.
+- Under `--dyn-chat-processor sglang`: accepted. Use SGLang parser names.
+- Under `--dyn-chat-processor dynamo`: **rejected at startup** with
+  `Unknown arguments specified: ...`. Use the `--dyn-*` worker flags instead.
+Upstream parser names are pinned to the engine version shipped in the Dynamo
+container. They may differ from Dynamo's names for the same model (e.g.,
+SGLang uses `deepseekv3` where Dynamo uses `deepseek_v3`).
+### `--use-vllm-tokenizer` / `--use-sglang-tokenizer`
+Worker flags (boolean). Hand tokenization to the engine instead of the
+frontend. The flag must match the engine on the worker.
+`--use-sglang-tokenizer` is deprecated. New SGLang deployments should use
+`--dyn-chat-processor sglang` (option C) instead. See
+[Migration from --use-sglang-tokenizer](../backends/sglang/sglang-chat-processor.md#migration-from---use-sglang-tokenizer).
+## Which option should I pick?
+1. **Does Dynamo have a parser for your model?** Check the per-model tables in
+   [Tool Calling](tool-calling.md) and [Reasoning](reasoning.md). If yes, use
+   **option A**. This is the default path: Rust parsing on the frontend,
+   KV-routable, lowest latency.
+2. **Does the upstream engine have a parser but Dynamo doesn't?** Use
+   **option B** (vLLM) or **option C** (SGLang). Still KV-routable.
+3. **Is the tokenizer itself the problem** (day-0 model, custom special tokens,
+   rope variants)? Use **option D**. KV routing is off; pair with
+   `--router-mode round-robin`.
+4. **SGLang + day-0 model?** Use **option C** with the appropriate upstream
+   parser name. Do not use option E (deprecated).
+## Invalid and silently broken combinations
+### Rejected at startup
+- **`--dyn-chat-processor dynamo` with `--tool-call-parser <name>`** (or
+  `--reasoning-parser`). The un-prefixed flags are not recognized under the
+  Dynamo chat processor. Use `--dyn-tool-call-parser` on the worker instead.
+- **`--tool-call-parser` and `--dyn-tool-call-parser` together** on the same
+  SGLang worker. SGLang rejects this: `Cannot use both --tool-call-parser and
+  --dyn-tool-call-parser`. Pick one namespace.
+- **`--use-vllm-tokenizer` on an SGLang worker** (and vice versa). The flag
+  must match the engine.
+### Silently broken (no startup error, wrong results)
+- **Tokenizer delegation + `--router-mode kv`** -- Options D/E with `kv`
+  routing produces prefix-hash mismatches and silent cache misses.
+- **`--dyn-tool-call-parser` + `--use-vllm-tokenizer`** on the same vLLM
+  worker. The worker bypasses Dynamo's preprocessor while the frontend-side
+  parser is still wired up, producing mismatched token streams. No
+  mutual-exclusivity check exists today.
+## Routing compatibility
+`--router-mode kv` needs frontend tokenization to compute prefix-hash routing
+keys. Options A, B, and C keep the tokenizer on the frontend and are
+KV-routable. Options D and E move tokenization to the worker and are **not**
+KV-routable -- pair them with `round-robin` or `random`.
+| Option | `kv` routing | `round-robin` / `random` |
+|--------|:---:|:---:|
+| A (Dynamo-native) | Yes | Yes |
+| B (vLLM processor) | Yes | Yes |
+| C (SGLang processor) | Yes | Yes |
+| D (vLLM tokenizer delegation) | **No** | Yes |
+| E (SGLang tokenizer delegation) | **No** | Yes |
+## Why each flag exists
+- **Frontend tokenization** is required for KV cache routing. The frontend
+  needs token IDs to compute prefix-hash routing keys before the request
+  reaches a worker. Parser flags on the Rust-native path (option A) co-locate
+  with tokenization on the frontend for this reason.
+- **Backend tokenization** is a fallback for when frontend tokenization can't
+  or shouldn't run: unsupported model, day-0 support, tokenizer edge cases
+  (custom special tokens, rope variants). The engine owns the tokenizer in
+  this mode, so KV routing drops out.
+- **Chat-processor swap** (options B/C) is the middle ground: tokenization
+  stays on the frontend (KV-routable), but parsing delegates to the upstream
+  engine's Python implementation. This covers models where Dynamo's Rust
+  parser hasn't been written yet.
+## Parser names by model
+For the full list of supported parser names, which models they cover, and
+upstream name divergences (relevant for options B and C):
+- [Tool Calling](tool-calling.md) -- supported tool call parsers with model
+  mappings and upstream name differences
+- [Reasoning](reasoning.md) -- supported reasoning parsers with model mappings
+  and force-reasoning behavior
+## Canonical launch examples
+```bash
+# A -- Dynamo-native (default).
+python -m dynamo.vllm \
+  --dyn-tool-call-parser kimi_k2 \
+  --dyn-reasoning-parser kimi_k25
+python -m dynamo.frontend --dyn-chat-processor dynamo
+# B -- vLLM chat-processor (upstream parser names on the frontend).
+python -m dynamo.vllm ...
+python -m dynamo.frontend \
+  --dyn-chat-processor vllm \
+  --tool-call-parser hermes \
+  --reasoning-parser deepseek_r1
+# C -- SGLang chat-processor.
+python -m dynamo.sglang ...
+python -m dynamo.frontend \
+  --dyn-chat-processor sglang \
+  --tool-call-parser kimi_k2 \
+  --reasoning-parser kimi_k25
+# D -- vLLM tokenizer delegation (no KV routing).
+python -m dynamo.vllm --use-vllm-tokenizer ...
+python -m dynamo.frontend --router-mode round-robin
+```
+## See Also
+- [Tool Calling](tool-calling.md) -- Supported tool call parser names, request examples
+- [Reasoning](reasoning.md) -- Supported reasoning parser names, common pairings
+- [SGLang Chat Processor](../backends/sglang/sglang-chat-processor.md) -- Option C details
+- [Frontend Configuration Reference](../components/frontend/configuration.md) -- Full CLI flag reference
--- a/docs/agents/reasoning.md
+++ b/docs/agents/reasoning.md
@@ -7,6 +7,12 @@ subtitle: Configure reasoning parsers for models that emit thinking content
 Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
+> [!TIP]
+> This page covers parser names for the default Dynamo-native path. For a
+> comparison of all preprocessing options (including vLLM/SGLang chat-processor
+> swap and tokenizer delegation) and routing
+> compatibility, see [Chat Processor Options](chat-processor-options.md).
 ## Prerequisites
 To enable reasoning parsing, launch the backend worker with:
@@ -23,23 +29,33 @@ python -m dynamo.<backend> --help
 ## Supported Reasoning Parsers
-The reasoning parser names currently supported in the codebase are:
+The table below lists the currently supported reasoning parsers in Dynamo's registry. The
+**Upstream name** column shows where the vLLM or SGLang parser name differs
-| Parser Name | Typical Models / Format |
+from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
-|-------------|-------------------------|
+(see [Chat Processor Options](chat-processor-options.md)). A blank upstream
-| `basic` | Generic `<think>...</think>` reasoning blocks |
+column means the same name works everywhere. `Dynamo-only` means no upstream
-| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses |
+parser exists for this format.
-| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
-| `gpt_oss` | `openai/gpt-oss-*` |
+Parsers marked **force-reasoning** emit reasoning content from token one
-| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers |
+without requiring an explicit opening tag (`<think>`, etc.). All others
-| `kimi` | Kimi models that emit `◁think▷...◁/think▷` |
+require the opening tag to be present in the model output.
-| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
-| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended |
+| Parser Name | Models | Upstream name | Force-reasoning | Notes |
-| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` |
+|---|---|---|---|---|
-| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks |
+| `basic` | Generic CoT models | Dynamo-only | No | Plain `<think>...</think>` |
-| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag |
+| `deepseek_r1` | DeepSeek R1, DeepSeek V3.1, DeepSeek V3.2 | | Yes | Pass explicitly for V3.1/V3.2 (no alias) |
-| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses |
+| `glm45` | GLM-4.5, GLM-4.7 | Dynamo-only | No | Alias for `nemotron_deci`. `<think>...</think>` |
-| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen |
+| `gpt_oss` | gpt-oss-20b / -120b | Dynamo-only | No | Harmony channel reasoning format |
+| `granite` | Granite 3.x | | No | `Here's my thought process:` / `Here's my response:` |
+| `kimi` | Kimi K2 Instruct / Thinking | Dynamo-only | No | `◁think▷...◁/think▷` |
+| `kimi_k25` | Kimi K2.5 | Dynamo-only | Yes | `<think>...</think>` with force-reasoning |
+| `minimax_append_think` | MiniMax M2 / M2.1 | Dynamo-only | No | Implicit opening `<think>` prepended |
+| `mistral` | Magistral | | Yes | `[THINK]...[/THINK]` |
+| `nemotron3` | Nemotron-3 / Mini | Dynamo-only | Yes | Alias for `deepseek_r1` |
+| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron | Dynamo-only | No | `<think>...</think>` |
+| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Yes | Alias for `deepseek_r1` |
+| `qwen3` | QwQ-32B, Qwen3-Think, Qwen3-Coder | | No | `<think>...</think>` |
+| `step3` | Step-3 / Step-3-Reasoning | Dynamo-only | Yes | `<think>...</think>` |
 ## Common Parser Pairings

--- a/docs/agents/tool-calling.md
+++ b/docs/agents/tool-calling.md
@@ -10,6 +10,12 @@ to output function arguments for the relevant function(s) which you can execute
 Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.
+> [!TIP]
+> This page covers parser names for the default Dynamo-native path. For a
+> comparison of all preprocessing options (including vLLM/SGLang chat-processor
+> swap and tokenizer delegation) and routing
+> compatibility, see [Chat Processor Options](chat-processor-options.md).
 ## Prerequisites
 To enable this feature, you should set the following flag while launching the backend worker
@@ -33,27 +39,32 @@ python -m dynamo.<backend> --help
 ## Supported Tool Call Parsers
-The tool call parser names currently supported in the codebase are:
+The table below lists the currently supported tool call parsers in Dynamo's registry. The
+**Upstream name** column shows where the vLLM or SGLang parser name differs
-| Parser Name | Typical Models / Format |
+from Dynamo's -- relevant when using `--dyn-chat-processor vllm` or `sglang`
-|-------------|-------------------------|
+(see [Chat Processor Options](chat-processor-options.md)). A blank upstream
-| `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` |
+column means the same name works everywhere. `Dynamo-only` means no upstream
-| `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` |
+parser exists for this format.
-| `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<｜DSML｜function_calls>...`) |
-| `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured |
+| Parser Name | Models | Upstream name | Notes |
-| `glm47` | `zai-org/GLM-4.7` |
+|---|---|---|---|
-| `harmony` | `openai/gpt-oss-*` |
+| `deepseek_v3` | DeepSeek V3, DeepSeek R1-0528+ | SGLang: `deepseekv3` | Special Unicode markers |
-| `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` |
+| `deepseek_v3_1` | DeepSeek V3.1 | Dynamo-only | JSON separators |
-| `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` |
+| `deepseek_v3_2` | DeepSeek V3.2+ | Dynamo-only | DSML tags (`<｜DSML｜function_calls>...`) |
-| `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` |
+| `default` | *(fallback)* | Dynamo-only | Empty JSON config (no start/end tokens). Prefer a model-specific parser for production use. |
-| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` |
+| `glm47` | GLM-4.5, GLM-4.7 | Dynamo-only | XML `<arg_key>/<arg_value>` |
-| `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) |
+| `harmony` | gpt-oss-20b / -120b | Dynamo-only | Harmony channel format |
-| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` |
+| `hermes` | Qwen2.5-\*, QwQ-32B, Qwen3-Instruct, Qwen3-Think, NousHermes-2/3 | vLLM: `qwen2_5`; SGLang: `qwen25` (for Qwen models) | `<tool_call>` JSON |
-| `nemotron_deci` | `nvidia/nemotron-*` |
+| `jamba` | Jamba 1.5 / 1.6 / 1.7 | Dynamo-only | `<tool_calls>` JSON |
-| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` |
+| `kimi_k2` | Kimi K2 Instruct/Thinking, Kimi K2.5 | | Pair with `--dyn-reasoning-parser kimi` or `kimi_k25` |
-| `phi4` | `Phi-4-*` |
+| `llama3_json` | Llama 3 / 3.1 / 3.2 / 3.3 Instruct | | `<\|python_tag\|>` tool syntax |
-| `pythonic` | `meta-llama/Llama-4-*` |
+| `minimax_m2` | MiniMax M2 / M2.1 | vLLM: `minimax` | XML `<minimax:tool_call>` |
-| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` |
+| `mistral` | Mistral / Mixtral / Mistral-Nemo, Magistral | | `[TOOL_CALLS]...[/TOOL_CALLS]` |
+| `nemotron_deci` | Nemotron-Super / -Ultra / -Deci, Llama-Nemotron-Ultra / -Super | Dynamo-only | `<TOOLCALL>` JSON |
+| `nemotron_nano` | Nemotron-Nano | Dynamo-only | Alias for `qwen3_coder` |
+| `phi4` | Phi-4, Phi-4-mini, Phi-4-mini-reasoning | vLLM: `phi4_mini_json` | `functools[...]` JSON |
+| `pythonic` | Llama 4 (Scout / Maverick) | | Python-list tool syntax |
+| `qwen3_coder` | Qwen3-Coder | | XML `<tool_call><function=...>` |
 > [!TIP]
 > For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with

--- a/docs/index.yml
+++ b/docs/index.yml
@@ -133,6 +133,8 @@ navigation:
            path: backends/sglang/sglang-diffusion.md
          - page: TRT-LLM Diffusion
            path: backends/trtllm/trtllm-video-diffusion.md
+      - page: Chat Processor Options
+        path: agents/chat-processor-options.md
      - page: Tool Calling
        path: agents/tool-calling.md
      - page: Reasoning