Tool Call: Add `chat_template_kwargs` documentation (#5679)

95c231e5 · vzed · GitHub · 3042f1da · 95c231e5
Unverified Commit 95c231e5 authored May 04, 2025 by vzed Committed by GitHub May 04, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 57 additions and 1 deletion

docs/backend/openai_api_completions.ipynb docs/backend/openai_api_completions.ipynb +57 -1

No files found.
--- a/docs/backend/openai_api_completions.ipynb
+++ b/docs/backend/openai_api_completions.ipynb
@@ -94,7 +94,63 @@
    "\n",
    "The chat completions API accepts OpenAI Chat Completions API's parameters. Refer to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) for more details.\n",
    "\n",
-    "Here is an example of a detailed chat completion request:"
+    "SGLang extends the standard API with the `extra_body` parameter, allowing for additional customization. One key option within `extra_body` is `chat_template_kwargs`, which can be used to pass arguments to the chat template processor.\n",
+    "\n",
+    "#### Enabling Model Thinking/Reasoning\n",
+    "\n",
+    "You can use `chat_template_kwargs` to enable or disable the model's internal thinking or reasoning process output. Set `\"enable_thinking\": True` within `chat_template_kwargs` to include the reasoning steps in the response. This requires launching the server with a compatible reasoning parser (e.g., `--reasoning-parser qwen3` for Qwen3 models).\n",
+    "\n",
+    "Here's an example demonstrating how to enable thinking and retrieve the reasoning content separately (using `separate_reasoning: True`):\n",
+    "\n",
+    "```python\n",
+    "# Ensure the server is launched with a compatible reasoning parser, e.g.:\n",
+    "# python3 -m sglang.launch_server --model-path QwQ/Qwen3-32B-250415 --reasoning-parser qwen3 ...\n",
+    "\n",
+    "from openai import OpenAI\n",
+    "\n",
+    "# Modify OpenAI's API key and API base to use SGLang's API server.\n",
+    "openai_api_key = \"EMPTY\"\n",
+    "openai_api_base = f\"http://127.0.0.1:{port}/v1\" # Use the correct port\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    api_key=openai_api_key,\n",
+    "    base_url=openai_api_base,\n",
+    ")\n",
+    "\n",
+    "model = \"QwQ/Qwen3-32B-250415\" # Use the model loaded by the server\n",
+    "messages = [{\"role\": \"user\", \"content\": \"9.11 and 9.8, which is greater?\"}]\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=model,\n",
+    "    messages=messages,\n",
+    "    extra_body={\n",
+    "        \"chat_template_kwargs\": {\"enable_thinking\": True},\n",
+    "        \"separate_reasoning\": True\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(\"response.choices[0].message.reasoning_content: \\n\", response.choices[0].message.reasoning_content)\n",
+    "print(\"response.choices[0].message.content: \\n\", response.choices[0].message.content)\n",
+    "```\n",
+    "\n",
+    "**Example Output:**\n",
+    "\n",
+    "```\n",
+    "response.choices[0].message.reasoning_content: \n",
+    " Okay, so I need to figure out which number is greater between 9.11 and 9.8. Hmm, let me think. Both numbers start with 9, right? So the whole number part is the same. That means I need to look at the decimal parts to determine which one is bigger.\n",
+    "...\n",
+    "Therefore, after checking multiple methods—aligning decimals, subtracting, converting to fractions, and using a real-world analogy—it's clear that 9.8 is greater than 9.11.\n",
+    "\n",
+    "response.choices[0].message.content: \n",
+    " To determine which number is greater between **9.11** and **9.8**, follow these steps:\n",
+    "...\n",
+    "**Answer**:  \n",
+    "9.8 is greater than 9.11.\n",
+    "```\n",
+    "\n",
+    "Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`.\n",
+    "\n",
+    "Here is an example of a detailed chat completion request using standard OpenAI parameters:"
   ]
  },
  {