Unverified Commit a2077c96 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Update tool/reasoning parser support (#7605)

parent 86e589a1
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Reasoning
subtitle: Configure reasoning parsers for models that emit thinking content
---
Some models emit reasoning or thinking content separately from their final response. Dynamo can split that output into `reasoning_content` and normal assistant content by configuring `--dyn-reasoning-parser` on the backend worker.
## Prerequisites
To enable reasoning parsing, launch the backend worker with:
- `--dyn-reasoning-parser`: select the reasoning parser from the supported list below
```bash
# <backend> can be sglang, trtllm, vllm, etc. based on your installation
python -m dynamo.<backend> --help
```
> [!TIP]
> Some models need both a reasoning parser and a tool call parser. For supported tool call parser names, see [Tool Calling](tool-calling.md).
## Supported Reasoning Parsers
The reasoning parser names currently supported in the codebase are:
| Parser Name | Typical Models / Format |
|-------------|-------------------------|
| `basic` | Generic `<think>...</think>` reasoning blocks |
| `deepseek_r1` | Models that should treat output as reasoning until `</think>` is seen, such as `deepseek-ai/DeepSeek-R1` style responses |
| `glm45` | `zai-org/GLM-4.5` and GLM-5 style `<think>...</think>` reasoning blocks |
| `gpt_oss` | `openai/gpt-oss-*` |
| `granite` | Granite models that emit `Here's my thought process:` / `Here's my response:` markers |
| `kimi` | Kimi models that emit `◁think▷...◁/think▷` |
| `kimi_k25` | `moonshotai/Kimi-K2.5*` models that require force-reasoning handling for `<think>...</think>` |
| `minimax_append_think` | MiniMax models that begin reasoning immediately and effectively need an implicit opening `<think>` tag prepended |
| `mistral` | Mistral reasoning models that emit `[THINK]...[/THINK]` |
| `nemotron_deci` | Nemotron models that emit standard `<think>...</think>` reasoning blocks |
| `nemotron_nano` | Nemotron Nano reasoning output that ends with `</think>` without requiring a visible opening tag |
| `qwen3` | `Qwen/Qwen3-*` style `<think>...</think>` responses |
| `step3` | Step-style models that should treat content as reasoning until `</think>` is seen |
## Common Parser Pairings
Some models need both parsers configured together. Common pairings include:
- `openai/gpt-oss-*`: `--dyn-tool-call-parser harmony --dyn-reasoning-parser gpt_oss`
- `zai-org/GLM-4.7`: `--dyn-tool-call-parser glm47 --dyn-reasoning-parser glm45`
- `moonshotai/Kimi-K2.5*`: `--dyn-tool-call-parser kimi_k2 --dyn-reasoning-parser kimi_k25`
- MiniMax M2.1 style outputs: `--dyn-tool-call-parser minimax_m2 --dyn-reasoning-parser minimax_append_think`
## Tool Calling Interplay
Reasoning parsing happens before tool call parsing. If a model emits both reasoning content and tool calls, configure both parsers so Dynamo can first separate reasoning text and then parse tool calls from the remaining assistant output.
...@@ -10,49 +10,54 @@ to output function arguments for the relevant function(s) which you can execute ...@@ -10,49 +10,54 @@ to output function arguments for the relevant function(s) which you can execute
Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters. Tool calling (AKA function calling) is controlled using the `tool_choice` and `tools` request parameters.
## Prerequisites ## Prerequisites
To enable this feature, you should set the following flag while launching the backend worker To enable this feature, you should set the following flag while launching the backend worker
- `--dyn-tool-call-parser` : select the parser from the available parsers list using the below command - `--dyn-tool-call-parser`: select the tool call parser from the supported list below
```bash ```bash
# <backend> can be sglang, trtllm, vllm, etc. based on your installation # <backend> can be sglang, trtllm, vllm, etc. based on your installation
python -m dynamo.<backend> --help" python -m dynamo.<backend> --help
``` ```
> [!NOTE] > [!NOTE]
> If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on `<TOOLCALL>` and `<|python_tag|>` tool tags. > If no tool call parser is provided by the user, Dynamo will try to use default tool call parsing based on &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags.
> [!TIP] > [!TIP]
> If your model's default chat template doesn't support tool calling, but the model itself does, you can specify a custom chat template per worker > If your model's default chat template doesn't support tool calling, but the model itself does, you can specify a custom chat template per worker
> with `python -m dynamo.<backend> --custom-jinja-template </path/to/template.jinja>`. > with `python -m dynamo.<backend> --custom-jinja-template </path/to/template.jinja>`.
> [!TIP]
Parser to Model Mapping > If your model also emits reasoning content that should be separated from normal output, see [Reasoning](reasoning.md) for the supported `--dyn-reasoning-parser` values.
| Parser Name | Supported Models | ## Supported Tool Call Parsers
|-------------|-----------------------------------------------------------------------|
| hermes | Qwen/Qwen2.5-*, Qwen/QwQ-32B, NousResearch/Hermes-2-Pro-*, NousResearch/Hermes-2-Theta-*, NousResearch/Hermes-3-* | The tool call parser names currently supported in the codebase are:
| mistral | mistralai/Mistral-7B-Instruct-v0.3, Additional mistral function-calling models are compatible as well.|
| llama3_json | meta-llama/Llama-3.1-*, meta-llama/Llama-3.2-* | | Parser Name | Typical Models / Format |
| harmony | openai/gpt-oss-* | |-------------|-------------------------|
| nemotron_deci | nvidia/nemotron-* | | `deepseek_v3` | `deepseek-ai/DeepSeek-V3`, `deepseek-ai/DeepSeek-R1`, `deepseek-ai/DeepSeek-R1-0528` |
| nemotron_nano | nvidia/NVIDIA-Nemotron-3-Nano-* | | `deepseek_v3_1` | `deepseek-ai/DeepSeek-V3.1` |
| phi4 | Phi-4-* | | `deepseek_v3_2` | DeepSeek V3.2 DSML tool calling (`<|DSML|function_calls>...`) |
| deepseek_v3 | deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-0528 | | `default` | Dynamo's fallback parser for &lt;TOOLCALL&gt; and &lt;|python_tag|&gt; tool tags when no explicit parser is configured |
| deepseek_v3_1 | deepseek-ai/DeepSeek-V3.1 | | `glm47` | `zai-org/GLM-4.7` |
| pythonic | meta-llama/Llama-4-* | | `harmony` | `openai/gpt-oss-*` |
| jamba | ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7, | | `hermes` | `Qwen/Qwen2.5-*`, `Qwen/QwQ-32B`, `NousResearch/Hermes-2-Pro-*`, `NousResearch/Hermes-2-Theta-*`, `NousResearch/Hermes-3-*` |
| glm47 | zai-org/GLM-4.7 | | `jamba` | `ai21labs/AI21-Jamba-*-1.5`, `ai21labs/AI21-Jamba-*-1.6`, `ai21labs/AI21-Jamba-*-1.7` |
| kimi_k2 | moonshotai/Kimi-K2-Thinking*, moonshotai/Kimi-K2-Instruct*, moonshotai/Kimi-K2.5* | | `kimi_k2` | `moonshotai/Kimi-K2-Thinking*`, `moonshotai/Kimi-K2-Instruct*`, `moonshotai/Kimi-K2.5*`; currently requires converting `tiktoken.model` to `tokenizers.json` |
| `llama3_json` | `meta-llama/Llama-3.1-*`, `meta-llama/Llama-3.2-*` |
\* Currently requires converting `tiktoken.model` to `tokenizers.json`. | `minimax_m2` | MiniMax M2.1 XML-style tool calling (`<minimax:tool_call>...`) |
| `mistral` | `mistralai/Mistral-7B-Instruct-v0.3` and other Mistral models that emit `[TOOL_CALLS]...[/TOOL_CALLS]` |
| `nemotron_deci` | `nvidia/nemotron-*` |
| `nemotron_nano` | `nvidia/NVIDIA-Nemotron-3-Nano-*`; uses the same tool-call format as `qwen3_coder` |
| `phi4` | `Phi-4-*` |
| `pythonic` | `meta-llama/Llama-4-*` |
| `qwen3_coder` | XML-style tool calling such as `<tool_call><function=...>` |
> [!TIP] > [!TIP]
> For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with > For Kimi K2.5 thinking models, pair `--dyn-tool-call-parser kimi_k2` with
> `--dyn-reasoning-parser kimi_k25` so that both `<think>` blocks and tool calls > `--dyn-reasoning-parser kimi_k25` from [Reasoning](reasoning.md) so that both `<think>` blocks and tool calls
> are parsed correctly from the same response. > are parsed correctly from the same response.
## Examples ## Examples
......
...@@ -35,8 +35,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments. ...@@ -35,8 +35,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
|----------|---------|---------|-------------| |----------|---------|---------|-------------|
| `--endpoint` | `DYN_ENDPOINT` | Auto-generated | Dynamo endpoint in `dyn://namespace.component.endpoint` format | | `--endpoint` | `DYN_ENDPOINT` | Auto-generated | Dynamo endpoint in `dyn://namespace.component.endpoint` format |
| `--use-sglang-tokenizer` | `DYN_SGL_USE_TOKENIZER` | `false` | **[Deprecated]** Use `--dyn-chat-processor sglang` on the frontend instead. See [SGLang Chat Processor](sglang-chat-processor.md). | | `--use-sglang-tokenizer` | `DYN_SGL_USE_TOKENIZER` | `false` | **[Deprecated]** Use `--dyn-chat-processor sglang` on the frontend instead. See [SGLang Chat Processor](sglang-chat-processor.md). |
| `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md) parser (overrides SGLang's `--tool-call-parser`) | | `--dyn-tool-call-parser` | `DYN_TOOL_CALL_PARSER` | `None` | [Tool call](../../agents/tool-calling.md#supported-tool-call-parsers) parser (overrides SGLang's `--tool-call-parser`) |
| `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | Reasoning parser for chain-of-thought models | | `--dyn-reasoning-parser` | `DYN_REASONING_PARSER` | `None` | [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers) parser for chain-of-thought models |
| `--custom-jinja-template` | `DYN_CUSTOM_JINJA_TEMPLATE` | `None` | Custom chat template path (incompatible with `--use-sglang-tokenizer`) | | `--custom-jinja-template` | `DYN_CUSTOM_JINJA_TEMPLATE` | `None` | Custom chat template path (incompatible with `--use-sglang-tokenizer`) |
| `--embedding-worker` | `DYN_SGL_EMBEDDING_WORKER` | `false` | Run as embedding worker (also sets SGLang's `--is-embedding`) | | `--embedding-worker` | `DYN_SGL_EMBEDDING_WORKER` | `false` | Run as embedding worker (also sets SGLang's `--is-embedding`) |
| `--multimodal-encode-worker` | `DYN_SGL_MULTIMODAL_ENCODE_WORKER` | `false` | Run as [multimodal](../../features/multimodal/multimodal-sglang.md) encode worker (frontend-facing) | | `--multimodal-encode-worker` | `DYN_SGL_MULTIMODAL_ENCODE_WORKER` | `false` | Run as [multimodal](../../features/multimodal/multimodal-sglang.md) encode worker (frontend-facing) |
...@@ -50,6 +50,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments. ...@@ -50,6 +50,8 @@ These arguments are added by Dynamo on top of SGLang's native arguments.
`--disagg-config` and `--disagg-config-key` must be provided together. The selected section is written to a temp YAML file and passed to SGLang's `--config` flag. `--disagg-config` and `--disagg-config-key` must be provided together. The selected section is written to a temp YAML file and passed to SGLang's `--config` flag.
</Note> </Note>
The current supported parser names for both flags are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
## Tokenizer Behavior ## Tokenizer Behavior
By default, Dynamo handles tokenization and detokenization through its Rust-based frontend, passing `input_ids` to SGLang. This enables all frontend endpoints (`v1/chat/completions`, `v1/completions`, `v1/embeddings`). By default, Dynamo handles tokenization and detokenization through its Rust-based frontend, passing `input_ids` to SGLang. This enables all frontend endpoints (`v1/chat/completions`, `v1/completions`, `v1/embeddings`).
......
...@@ -27,6 +27,10 @@ The `--help` output is organized into the following groups: ...@@ -27,6 +27,10 @@ The `--help` output is organized into the following groups:
- **Dynamo vLLM Options** — Disaggregation mode, tokenizer selection, sleep mode, multimodal flags, vLLM-Omni pipeline configuration, headless mode, and ModelExpress. These use `DYN_VLLM_*` env vars. - **Dynamo vLLM Options** — Disaggregation mode, tokenizer selection, sleep mode, multimodal flags, vLLM-Omni pipeline configuration, headless mode, and ModelExpress. These use `DYN_VLLM_*` env vars.
- **vLLM Engine Options** — All native vLLM arguments (`--model`, `--tensor-parallel-size`, `--kv-transfer-config`, `--kv-events-config`, `--enable-prefix-caching`, etc.). See the [vLLM serve args documentation](https://docs.vllm.ai/en/stable/configuration/serve_args.html). - **vLLM Engine Options** — All native vLLM arguments (`--model`, `--tensor-parallel-size`, `--kv-transfer-config`, `--kv-events-config`, `--enable-prefix-caching`, etc.). See the [vLLM serve args documentation](https://docs.vllm.ai/en/stable/configuration/serve_args.html).
### Tool and Reasoning Parsers
Use `--dyn-tool-call-parser` and `--dyn-reasoning-parser` to match the model's output format when the model emits tool calls and/or reasoning content. The current supported values are documented in [Tool Calling](../../agents/tool-calling.md#supported-tool-call-parsers) and [Reasoning](../../agents/reasoning.md#supported-reasoning-parsers).
### Prompt Embeddings ### Prompt Embeddings
Dynamo supports [vLLM prompt embeddings](https://docs.vllm.ai/en/stable/features/prompt_embeds.html) — pre-computed embeddings bypass tokenization in the Rust frontend and are decoded to tensors in the worker. Dynamo supports [vLLM prompt embeddings](https://docs.vllm.ai/en/stable/features/prompt_embeds.html) — pre-computed embeddings bypass tokenization in the Rust frontend and are decoded to tensors in the worker.
......
...@@ -135,6 +135,8 @@ navigation: ...@@ -135,6 +135,8 @@ navigation:
path: backends/trtllm/trtllm-video-diffusion.md path: backends/trtllm/trtllm-video-diffusion.md
- page: Tool Calling - page: Tool Calling
path: agents/tool-calling.md path: agents/tool-calling.md
- page: Reasoning
path: agents/reasoning.md
- page: LoRA Adapters - page: LoRA Adapters
path: features/lora/README.md path: features/lora/README.md
- section: Agents - section: Agents
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment