docs: Update tool/reasoning parser support (#7605)

a2077c96 · Ryan McCormick · GitHub · 86e589a1 · a2077c96 · a2077c96
Unverified Commit a2077c96 authored Mar 31, 2026 by Ryan McCormick Committed by GitHub Mar 31, 2026
5 changed files
--- a/docs/agents/reasoning.md
+++ b/docs/agents/reasoning.md
+---
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+title: Reasoning
+subtitle: Configure reasoning parsers for models that emit thinking content
+---
+
+Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
+
+## Prerequisites
+
+To enable reasoning parsing, launch the backend worker with:
+
+- `--dyn-reasoning-parser`: select the reasoning parser from the supported list below
+
+```bash
+# <backend> can be sglang, trtllm, vllm, etc. based on your installation
+python -m dynamo.<backend> --help
+```
+
+> [!TIP]
+> Some models need both a reasoning parser and a tool call parser. For supported tool call parser names, see [Tool Calling](tool-calling.md).
+
+## Supported Reasoning Parsers
+
+The reasoning parser names currently supported in the codebase are:
+
+| Parser Name | Typical Models / Format |
+|-------------|-------------------------|
+| `basic` | Generic `<think>...</think>` reasoning blocks |
+| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses |
+| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
+| `gpt_oss` | `openai/gpt-oss-*` |
+| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers |
+| `kimi` | Kimi models that emit `◁think▷...◁/think▷` |
+| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
+| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended |
+| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` |
+| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks |
+| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag |
+| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses |
+| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen |
+
+## Common Parser Pairings
+
+Some models need both parsers configured together. Common pairings include:
+
+- `openai/gpt-oss-*`: `--dyn-tool-call-parser harmony --dyn-reasoning-parser gpt_oss`
+- `zai-org/GLM-4.7`: `--dyn-tool-call-parser glm47 --dyn-reasoning-parser glm45`
+- `moonshotai/Kimi-K2.5*`: `--dyn-tool-call-parser kimi_k2 --dyn-reasoning-parser kimi_k25`
+- MiniMax M2.1 style outputs: `--dyn-tool-call-parser minimax_m2 --dyn-reasoning-parser minimax_append_think`
+
+## Tool Calling Interplay
+
+Reasoning parsing happens before tool call parsing. If a model emits both reasoning content and tool calls, configure both parsers so Dynamo can first separate reasoning text and then parse tool calls from the remaining assistant output.
--- a/docs/agents/tool-calling.md
+++ b/docs/agents/tool-calling.md
@@ -10,49 +10,54 @@ to output function arguments for the relevant function(s) which you can execute

 Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.

-
 ## Prerequisites

 To enable this feature, you should set the following flag while launching the backend worker

- `--dyn-tool-call-parser` : select the parser from the available parsers list using the below command
+- `--dyn-tool-call-parser`: select the tool call parser from the supported list below

 ```bash
 # <backend> can be sglang, trtllm, vllm, etc. based on your installation
-python -m dynamo.<backend> --help"
+python -m dynamo.<backend> --help
 ```

 > [!NOTE]
-> If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on `<TOOLCALL>` and `<|python_tag|>` tool tags.
+> If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags.

 > [!TIP]
 > If your model's default chat template doesn't support tool calling, but the model itself does, you can specify a custom chat template per worker
 > with `python -m dynamo.<backend> --custom-jinja-template </path/to/template.jinja>`.

-
-Parser to Model Mapping
-
-| Parser Name | Supported Models                                                      |
-|-------------|-----------------------------------------------------------------------|
-| hermes      | Qwen/Qwen2.5-*, Qwen/QwQ-32B, NousResearch/Hermes-2-Pro-*, NousResearch/Hermes-2-Theta-*, NousResearch/Hermes-3-* |
-| mistral | mistralai/Mistral-7B-Instruct-v0.3, Additional mistral function-calling models are compatible as well.|
-| llama3_json | meta-llama/Llama-3.1-*, meta-llama/Llama-3.2-* |
-| harmony | openai/gpt-oss-* |
-| nemotron_deci | nvidia/nemotron-* |
-| nemotron_nano | nvidia/NVIDIA-Nemotron-3-Nano-* |
-| phi4 | Phi-4-* |
-| deepseek_v3 | deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-0528 |
-| deepseek_v3_1 | deepseek-ai/DeepSeek-V3.1 |
-| pythonic |  meta-llama/Llama-4-* |
-| jamba |  ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7, |
-| glm47 | zai-org/GLM-4.7 |
-| kimi_k2 | moonshotai/Kimi-K2-Thinking*, moonshotai/Kimi-K2-Instruct*, moonshotai/Kimi-K2.5* |
-
-\* Currently requires converting `tiktoken.model` to `tokenizers.json`.
+> [!TIP]
+> If your model also emits reasoning content that should be separated from normal output, see [Reasoning](reasoning.md) for the supported `--dyn-reasoning-parser` values.
+
+## Supported Tool Call Parsers
+
+The tool call parser names currently supported in the codebase are:
+
+| Parser Name | Typical Models / Format |
+|-------------|-------------------------|
+| `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` |
+| `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` |
+| `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<｜DSML｜function_calls>...`) |
+| `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured |
+| `glm47` | `zai-org/GLM-4.7` |
+| `harmony` | `openai/gpt-oss-*` |
+| `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` |
+| `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` |
+| `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` |
+| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` |
+| `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) |
+| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` |
+| `nemotron_deci` | `nvidia/nemotron-*` |
+| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` |
+| `phi4` | `Phi-4-*` |
+| `pythonic` | `meta-llama/Llama-4-*` |
+| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` |

 > [!TIP]
 > For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with
-> `--dyn-reasoning-parser kimi_k25` so that both `<think>` blocks and tool calls
+> `--dyn-reasoning-parser kimi_k25` from [Reasoning](reasoning.md) so that both `<think>` blocks and tool calls
 > are parsed correctly from the same response.

 ## Examples

--- a/docs/backends/sglang/sglang-reference-guide.md
+++ b/docs/backends/sglang/sglang-reference-guide.md
@@ -35,8 +35,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
 |----------|---------|---------|-------------|
 | `--endpoint` | `DYN_ENDPOINT` | Auto-generated | Dynamo endpoint in `dyn://namespace.component.endpoint` format |
 | `--use-sglang-tokenizer` | `DYN_SGL_USE_TOKENIZER` | `false` | **[Deprecated]** Use `--dyn-chat-processor sglang` on the frontend instead. See [SGLang Chat Processor](sglang-chat-processor.md). |
-| `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md) parser (overrides SGLang's `--tool-call-parser`) |
-| `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | Reasoning parser for chain-of-thought models |
+| `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md#supported-tool-call-parsers) parser (overrides SGLang's `--tool-call-parser`) |
+| `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers) parser for chain-of-thought models |
 | `--custom-jinja-template` | `DYN_CUSTOM_JINJA_TEMPLATE` | `None` | Custom chat template path (incompatible with `--use-sglang-tokenizer`) |
 | `--embedding-worker` | `DYN_SGL_EMBEDDING_WORKER` | `false` | Run as embedding worker (also sets SGLang's `--is-embedding`) |
 | `--multimodal-encode-worker` | `DYN_SGL_MULTIMODAL_ENCODE_WORKER` | `false` | Run as [multimodal](../../features/multimodal/multimodal-sglang.md) encode worker (frontend-facing) |
@@ -50,6 +50,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
 `--disagg-config` and `--disagg-config-key` must be provided together. The selected section is written to a temp YAML file and passed to SGLang's `--config` flag.
 </Note>

+The current supported parser names for both flags are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
+
 ## Tokenizer Behavior

 By default, Dynamo handles tokenization and detokenization through its Rust-based frontend, passing `input_ids` to SGLang. This enables all frontend endpoints (`v1/chat/completions`, `v1/completions`, `v1/embeddings`).

--- a/docs/backends/vllm/vllm-reference-guide.md
+++ b/docs/backends/vllm/vllm-reference-guide.md
@@ -27,6 +27,10 @@ The `--help` output is organized into the following groups:
 - **Dynamo vLLM Options** — Disaggregation mode, tokenizer selection, sleep mode, multimodal flags, vLLM-Omni pipeline configuration, headless mode, and ModelExpress. These use `DYN_VLLM_*` env vars.
 - **vLLM Engine Options** — All native vLLM arguments (`--model`, `--tensor-parallel-size`, `--kv-transfer-config`, `--kv-events-config`, `--enable-prefix-caching`, etc.). See the [vLLM serve args documentation](https://docs.vllm.ai/en/stable/configuration/serve_args.html).

+### Tool and Reasoning Parsers
+
+Use `--dyn-tool-call-parser` and `--dyn-reasoning-parser` to match the model's output format when the model emits tool calls and/or reasoning content. The current supported values are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
+
 ### Prompt Embeddings

 Dynamo supports [vLLM prompt embeddings](https://docs.vllm.ai/en/stable/features/prompt_embeds.html) — pre-computed embeddings bypass tokenization in the Rust frontend and are decoded to tensors in the worker.

--- a/docs/index.yml
+++ b/docs/index.yml
@@ -135,6 +135,8 @@ navigation:
            path: backends/trtllm/trtllm-video-diffusion.md
      - page: Tool Calling
        path: agents/tool-calling.md
+      - page: Reasoning
+        path: agents/reasoning.md
      - page: LoRA Adapters
        path: features/lora/README.md
      - section: Agents