Add deepseek v3.1 thinking parser support and update docs (#9464)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

Add deepseek v3.1 thinking parser support and update docs (#9464)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
e8449ab5 · Xinyuan Tong · GitHub · 4746aaea · e8449ab5 · e8449ab5
Unverified Commit e8449ab5 authored Aug 22, 2025 by Xinyuan Tong Committed by GitHub Aug 21, 2025
3 changed files
--- a/docs/basic_usage/openai_api_completions.ipynb
+++ b/docs/basic_usage/openai_api_completions.ipynb
@@ -78,6 +78,129 @@
    "print_highlight(f\"Response: {response}\")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Model Thinking/Reasoning Support\n",
+    "\n",
+    "Some models support internal reasoning or thinking processes that can be exposed in the API response. SGLang provides unified support for various reasoning models through the `chat_template_kwargs` parameter and compatible reasoning parsers.\n",
+    "\n",
+    "#### Supported Models and Configuration\n",
+    "\n",
+    "| Model Family | Chat Template Parameter | Reasoning Parser | Notes |\n",
+    "|--------------|------------------------|------------------|--------|\n",
+    "| DeepSeek-R1 (R1, R1-0528, R1-Distill) | `enable_thinking` | `--reasoning-parser deepseek-r1` | Standard reasoning models |\n",
+    "| DeepSeek-V3.1 | `thinking` | `--reasoning-parser deepseek-v3` | Hybrid model (thinking/non-thinking modes) |\n",
+    "| Qwen3 (standard) | `enable_thinking` | `--reasoning-parser qwen3` | Hybrid model (thinking/non-thinking modes) |\n",
+    "| Qwen3-Thinking | N/A (always enabled) | `--reasoning-parser qwen3-thinking` | Always generates reasoning |\n",
+    "| Kimi | N/A (always enabled) | `--reasoning-parser kimi` | Kimi thinking models |\n",
+    "| Gpt-Oss | N/A (always enabled) | `--reasoning-parser gpt-oss` | Gpt-Oss thinking models |\n",
+    "\n",
+    "#### Basic Usage\n",
+    "\n",
+    "To enable reasoning output, you need to:\n",
+    "1. Launch the server with the appropriate reasoning parser\n",
+    "2. Set the model-specific parameter in `chat_template_kwargs`\n",
+    "3. Optionally use `separate_reasoning: False` to not get reasoning content separately (default to `True`)\n",
+    "\n",
+    "**Note for Qwen3-Thinking models:** These models always generate thinking content and do not support the `enable_thinking` parameter. Use `--reasoning-parser qwen3-thinking` or `--reasoning-parser qwen3` to parse the thinking content.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Example: Qwen3 Models\n",
+    "\n",
+    "```python\n",
+    "# Launch server:\n",
+    "# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3\n",
+    "\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    api_key=\"EMPTY\",\n",
+    "    base_url=f\"http://127.0.0.1:{port}/v1\",\n",
+    ")\n",
+    "\n",
+    "model = \"QwQ/Qwen3-32B-250415\"\n",
+    "messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model,\n",
+    "    messages=messages,\n",
+    "    extra_body={\n",
+    "        \"chat_template_kwargs\": {\"enable_thinking\": True},\n",
+    "        \"separate_reasoning\": True\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(\"Reasoning:\", response.choices[0].message.reasoning_content)\n",
+    "print(\"Answer:\", response.choices[0].message.content)\n",
+    "```\n",
+    "\n",
+    "**Output:**\n",
+    "```\n",
+    "Reasoning: Okay, so I need to figure out which number is greater between 9.11 and 9.8...\n",
+    "Answer: 9.8 is greater than 9.11.\n",
+    "```\n",
+    "\n",
+    "**Note:** Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`. Qwen3-Thinking models always generate reasoning content and don't support the `enable_thinking` parameter.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Example: DeepSeek-V3 Models\n",
+    "\n",
+    "DeepSeek-V3 models support thinking mode through the `thinking` parameter:\n",
+    "\n",
+    "```python\n",
+    "# Launch server:\n",
+    "# python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --reasoning-parser deepseek-v3\n",
+    "\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    api_key=\"EMPTY\",\n",
+    "    base_url=f\"http://127.0.0.1:{port}/v1\",\n",
+    ")\n",
+    "\n",
+    "model = \"deepseek-ai/DeepSeek-V3\"\n",
+    "messages = [{\"role\": \"user\", \"content\": \"What is 2^8?\"}]\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model,\n",
+    "    messages=messages,\n",
+    "    extra_body={\n",
+    "        \"chat_template_kwargs\": {\"thinking\": True},\n",
+    "        \"separate_reasoning\": True\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(\"Reasoning:\", response.choices[0].message.reasoning_content)\n",
+    "print(\"Answer:\", response.choices[0].message.content)\n",
+    "```\n",
+    "\n",
+    "**Output:**\n",
+    "```\n",
+    "Reasoning: <think>I need to calculate 2^8. Let me work through this step by step:\n",
+    "2^1 = 2\n",
+    "2^2 = 4\n",
+    "2^3 = 8\n",
+    "2^4 = 16\n",
+    "2^5 = 32\n",
+    "2^6 = 64\n",
+    "2^7 = 128\n",
+    "2^8 = 256</think>\n",
+    "Answer: 2^8 equals 256.\n",
+    "```\n",
+    "\n",
+    "**Note:** DeepSeek-V3 models use the `thinking` parameter (not `enable_thinking`) to control reasoning output.\n"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -144,75 +267,6 @@
    "        print(chunk.choices[0].delta.content, end=\"\")"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Enabling Model Thinking/Reasoning\n",
-    "\n",
-    "You can use `chat_template_kwargs` to enable or disable the model's internal thinking or reasoning process output. Set `\"enable_thinking\": True` within `chat_template_kwargs` to include the reasoning steps in the response. This requires launching the server with a compatible reasoning parser.\n",
-    "\n",
-    "**Reasoning Parser Options:**\n",
-    "- `--reasoning-parser deepseek-r1`: For DeepSeek-R1 family models (R1, R1-0528, R1-Distill)\n",
-    "- `--reasoning-parser qwen3`: For both standard Qwen3 models that support `enable_thinking` parameter and Qwen3-Thinking models\n",
-    "- `--reasoning-parser qwen3-thinking`: For Qwen3-Thinking models, force reasoning version of qwen3 parser\n",
-    "- `--reasoning-parser kimi`: For Kimi thinking models\n",
-    "\n",
-    "Here's an example demonstrating how to enable thinking and retrieve the reasoning content separately (using `separate_reasoning: True`):\n",
-    "\n",
-    "```python\n",
-    "# For Qwen3 models with enable_thinking support:\n",
-    "# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3 ...\n",
-    "\n",
-    "from openai import OpenAI\n",
-    "\n",
-    "# Modify OpenAI's API key and API base to use SGLang's API server.\n",
-    "openai_api_key = \"EMPTY\"\n",
-    "openai_api_base = f\"http://127.0.0.1:{port}/v1\" # Use the correct port\n",
-    "\n",
-    "client = OpenAI(\n",
-    "    api_key=openai_api_key,\n",
-    "    base_url=openai_api_base,\n",
-    ")\n",
-    "\n",
-    "model = \"QwQ/Qwen3-32B-250415\" # Use the model loaded by the server\n",
-    "messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n",
-    "\n",
-    "response = client.chat.completions.create(\n",
-    "    model=model,\n",
-    "    messages=messages,\n",
-    "    extra_body={\n",
-    "        \"chat_template_kwargs\": {\"enable_thinking\": True},\n",
-    "        \"separate_reasoning\": True\n",
-    "    }\n",
-    ")\n",
-    "\n",
-    "print(\"response.choices[0].message.reasoning_content: \\n\", response.choices[0].message.reasoning_content)\n",
-    "print(\"response.choices[0].message.content: \\n\", response.choices[0].message.content)\n",
-    "```\n",
-    "\n",
-    "**Example Output:**\n",
-    "\n",
-    "```\n",
-    "response.choices[0].message.reasoning_content: \n",
-    " Okay, so I need to figure out which number is greater between 9.11 and 9.8. Hmm, let me think. Both numbers start with 9, right? So the whole number part is the same. That means I need to look at the decimal parts to determine which one is bigger.\n",
-    "...\n",
-    "Therefore, after checking multiple methods—aligning decimals, subtracting, converting to fractions, and using a real-world analogy—it's clear that 9.8 is greater than 9.11.\n",
-    "\n",
-    "response.choices[0].message.content: \n",
-    " To determine which number is greater between **9.11** and **9.8**, follow these steps:\n",
-    "...\n",
-    "**Answer**:  \n",
-    "9.8 is greater than 9.11.\n",
-    "```\n",
-    "\n",
-    "Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`.\n",
-    "\n",
-    "**Note for Qwen3-Thinking models:** These models always generate thinking content and do not support the `enable_thinking` parameter. Use `--reasoning-parser qwen3-thinking` or `--reasoning-parser qwen3` to parse the thinking content.\n",
-    "\n",
-    "Here is an example of a detailed chat completion request using standard OpenAI parameters:"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},

--- a/python/sglang/srt/entrypoints/openai/serving_chat.py
+++ b/python/sglang/srt/entrypoints/openai/serving_chat.py
@@ -872,12 +872,15 @@ class OpenAIServingChat(OpenAIServingBase):
        Returns:
            The boolean value of 'enable_thinking' if found, otherwise False.
        """
-        if (
-            hasattr(request, "chat_template_kwargs")
-            and request.chat_template_kwargs
-            and request.chat_template_kwargs.get("enable_thinking") is not None
-        ):
+        if hasattr(request, "chat_template_kwargs") and request.chat_template_kwargs:
+            # For Qwen3 models, `enable_thinking` is supported.
+            if request.chat_template_kwargs.get("enable_thinking") is not None:
                return request.chat_template_kwargs.get("enable_thinking")
+            # For DeepSeek-V3.1 models, `thinking` is supported.
+            elif request.chat_template_kwargs.get("thinking") is not None:
+                return request.chat_template_kwargs.get("thinking")
+            else:
+                return False
        return False

    async def _process_tool_call_stream(

--- a/python/sglang/srt/reasoning_parser.py
+++ b/python/sglang/srt/reasoning_parser.py
@@ -513,12 +513,13 @@ class ReasoningParser:

    DetectorMap: Dict[str, Type[BaseReasoningFormatDetector]] = {
        "deepseek-r1": DeepSeekR1Detector,
-        "qwen3": Qwen3Detector,
-        "qwen3-thinking": Qwen3Detector,
+        "deepseek-v3": Qwen3Detector,
        "glm45": Qwen3Detector,
+        "gpt-oss": GptOssDetector,
        "kimi": KimiDetector,
+        "qwen3": Qwen3Detector,
+        "qwen3-thinking": Qwen3Detector,
        "step3": DeepSeekR1Detector,
-        "gpt-oss": GptOssDetector,
    }

    def __init__(