Unverified Commit a2077c96 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Update tool/reasoning parser support (#7605)

parent 86e589a1
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Reasoning
subtitle: Configure reasoning parsers for models that emit thinking content
---
Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
## Prerequisites
To enable reasoning parsing, launch the backend worker with:
- `--dyn-reasoning-parser`: select the reasoning parser from the supported list below
```bash
# <backend> can be sglang, trtllm, vllm, etc. based on your installation
python -m dynamo.<backend> --help
```
> [!TIP]
> Some models need both a reasoning parser and a tool call parser. For supported tool call parser names, see [Tool Calling](tool-calling.md).
## Supported Reasoning Parsers
The reasoning parser names currently supported in the codebase are:
| Parser Name | Typical Models / Format |
|-------------|-------------------------|
| `basic` | Generic `<think>...</think>` reasoning blocks |
| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses |
| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
| `gpt_oss` | `openai/gpt-oss-*` |
| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers |
| `kimi` | Kimi models that emit `◁think▷...◁/think▷` |
| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended |
| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` |
| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks |
| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag |
| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses |
| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen |
## Common Parser Pairings
Some models need both parsers configured together. Common pairings include:
- `openai/gpt-oss-*`: `--dyn-tool-call-parser harmony --dyn-reasoning-parser gpt_oss`
- `zai-org/GLM-4.7`: `--dyn-tool-call-parser glm47 --dyn-reasoning-parser glm45`
- `moonshotai/Kimi-K2.5*`: `--dyn-tool-call-parser kimi_k2 --dyn-reasoning-parser kimi_k25`
- MiniMax M2.1 style outputs: `--dyn-tool-call-parser minimax_m2 --dyn-reasoning-parser minimax_append_think`
## Tool Calling Interplay
Reasoning parsing happens before tool call parsing. If a model emits both reasoning content and tool calls, configure both parsers so Dynamo can first separate reasoning text and then parse tool calls from the remaining assistant output.
......@@ -10,49 +10,54 @@ to output function arguments for the relevant function(s) which you can execute
Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.
## Prerequisites
To enable this feature, you should set the following flag while launching the backend worker
- `--dyn-tool-call-parser` : select the parser from the available parsers list using the below command
- `--dyn-tool-call-parser`: select the tool call parser from the supported list below
```bash
# <backend> can be sglang, trtllm, vllm, etc. based on your installation
python -m dynamo.<backend> --help"
python -m dynamo.<backend> --help
```
> [!NOTE]
> If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on `<TOOLCALL>` and `<|python_tag|>` tool tags.
> If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags.
> [!TIP]
> If your model's default chat template doesn't support tool calling, but the model itself does, you can specify a custom chat template per worker
> with `python -m dynamo.<backend> --custom-jinja-template </path/to/template.jinja>`.
Parser to Model Mapping
| Parser Name | Supported Models |
|-------------|-----------------------------------------------------------------------|
| hermes | Qwen/Qwen2.5-*, Qwen/QwQ-32B, NousResearch/Hermes-2-Pro-*, NousResearch/Hermes-2-Theta-*, NousResearch/Hermes-3-* |
| mistral | mistralai/Mistral-7B-Instruct-v0.3, Additional mistral function-calling models are compatible as well.|
| llama3_json | meta-llama/Llama-3.1-*, meta-llama/Llama-3.2-* |
| harmony | openai/gpt-oss-* |
| nemotron_deci | nvidia/nemotron-* |
| nemotron_nano | nvidia/NVIDIA-Nemotron-3-Nano-* |
| phi4 | Phi-4-* |
| deepseek_v3 | deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-0528 |
| deepseek_v3_1 | deepseek-ai/DeepSeek-V3.1 |
| pythonic | meta-llama/Llama-4-* |
| jamba | ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7, |
| glm47 | zai-org/GLM-4.7 |
| kimi_k2 | moonshotai/Kimi-K2-Thinking*, moonshotai/Kimi-K2-Instruct*, moonshotai/Kimi-K2.5* |
\* Currently requires converting `tiktoken.model` to `tokenizers.json`.
> [!TIP]
> If your model also emits reasoning content that should be separated from normal output, see [Reasoning](reasoning.md) for the supported `--dyn-reasoning-parser` values.
## Supported Tool Call Parsers
The tool call parser names currently supported in the codebase are:
| Parser Name | Typical Models / Format |
|-------------|-------------------------|
| `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` |
| `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` |
| `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<|DSML|function_calls>...`) |
| `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured |
| `glm47` | `zai-org/GLM-4.7` |
| `harmony` | `openai/gpt-oss-*` |
| `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` |
| `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` |
| `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` |
| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` |
| `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) |
| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` |
| `nemotron_deci` | `nvidia/nemotron-*` |
| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` |
| `phi4` | `Phi-4-*` |
| `pythonic` | `meta-llama/Llama-4-*` |
| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` |
> [!TIP]
> For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with
> `--dyn-reasoning-parser kimi_k25` so that both `<think>` blocks and tool calls
> `--dyn-reasoning-parser kimi_k25` from [Reasoning](reasoning.md) so that both `<think>` blocks and tool calls
> are parsed correctly from the same response.
## Examples
......
......@@ -35,8 +35,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
|----------|---------|---------|-------------|
| `--endpoint` | `DYN_ENDPOINT` | Auto-generated | Dynamo endpoint in `dyn://namespace.component.endpoint` format |
| `--use-sglang-tokenizer` | `DYN_SGL_USE_TOKENIZER` | `false` | **[Deprecated]** Use `--dyn-chat-processor sglang` on the frontend instead. See [SGLang Chat Processor](sglang-chat-processor.md). |
| `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md) parser (overrides SGLang's `--tool-call-parser`) |
| `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | Reasoning parser for chain-of-thought models |
| `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md#supported-tool-call-parsers) parser (overrides SGLang's `--tool-call-parser`) |
| `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers) parser for chain-of-thought models |
| `--custom-jinja-template` | `DYN_CUSTOM_JINJA_TEMPLATE` | `None` | Custom chat template path (incompatible with `--use-sglang-tokenizer`) |
| `--embedding-worker` | `DYN_SGL_EMBEDDING_WORKER` | `false` | Run as embedding worker (also sets SGLang's `--is-embedding`) |
| `--multimodal-encode-worker` | `DYN_SGL_MULTIMODAL_ENCODE_WORKER` | `false` | Run as [multimodal](../../features/multimodal/multimodal-sglang.md) encode worker (frontend-facing) |
......@@ -50,6 +50,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
`--disagg-config` and `--disagg-config-key` must be provided together. The selected section is written to a temp YAML file and passed to SGLang's `--config` flag.
</Note>
The current supported parser names for both flags are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
## Tokenizer Behavior
By default, Dynamo handles tokenization and detokenization through its Rust-based frontend, passing `input_ids` to SGLang. This enables all frontend endpoints (`v1/chat/completions`, `v1/completions`, `v1/embeddings`).
......
......@@ -27,6 +27,10 @@ The `--help` output is organized into the following groups:
- **Dynamo vLLM Options** — Disaggregation mode, tokenizer selection, sleep mode, multimodal flags, vLLM-Omni pipeline configuration, headless mode, and ModelExpress. These use `DYN_VLLM_*` env vars.
- **vLLM Engine Options** — All native vLLM arguments (`--model`, `--tensor-parallel-size`, `--kv-transfer-config`, `--kv-events-config`, `--enable-prefix-caching`, etc.). See the [vLLM serve args documentation](https://docs.vllm.ai/en/stable/configuration/serve_args.html).
### Tool and Reasoning Parsers
Use `--dyn-tool-call-parser` and `--dyn-reasoning-parser` to match the model's output format when the model emits tool calls and/or reasoning content. The current supported values are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
### Prompt Embeddings
Dynamo supports [vLLM prompt embeddings](https://docs.vllm.ai/en/stable/features/prompt_embeds.html) — pre-computed embeddings bypass tokenization in the Rust frontend and are decoded to tensors in the worker.
......
......@@ -135,6 +135,8 @@ navigation:
path: backends/trtllm/trtllm-video-diffusion.md
- page: Tool Calling
path: agents/tool-calling.md
- page: Reasoning
path: agents/reasoning.md
- page: LoRA Adapters
path: features/lora/README.md
- section: Agents
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment