[docs] Refactor, remove compiled results and add gpt-oss (#9613)

Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>

[docs] Refactor, remove compiled results and add gpt-oss (#9613)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
9b08d975 · Chayenne · GitHub · a0a77d93 · 9b08d975 · 9b08d975
Unverified Commit 9b08d975 authored Aug 25, 2025 by Chayenne Committed by GitHub Aug 25, 2025
5 changed files
--- a/docs/advanced_features/function_calling.ipynb
+++ b/docs/advanced_features/function_calling.ipynb
@@ -51,7 +51,8 @@
    "- mistral: Mistral (e.g. mistralai/Mistral-7B-Instruct-v0.3, mistralai/Mistral-Nemo-Instruct-2407, mistralai/\n",
    "Mistral-Nemo-Instruct-2407, mistralai/Mistral-7B-v0.3).\n",
    "- qwen25: Qwen 2.5 (e.g. Qwen/Qwen2.5-1.5B-Instruct, Qwen/Qwen2.5-7B-Instruct) and QwQ (i.e. Qwen/QwQ-32B). Especially, for QwQ, we can enable the reasoning parser together with tool call parser, details about reasoning parser can be found in [reasoning parser](https://docs.sglang.ai/backend/separate_reasoning.html).\n",
-    "- deepseekv3: DeepSeek-v3 (e.g., deepseek-ai/DeepSeek-V3-0324).\n"
+    "- deepseekv3: DeepSeek-v3 (e.g., deepseek-ai/DeepSeek-V3-0324).\n",
+    "- gpt-oss: GPT-OSS (e.g., openai/gpt-oss-120b, openai/gpt-oss-20b, lmsys/gpt-oss-120b-bf16, lmsys/gpt-oss-20b-bf16). Note: The gpt-oss tool parser filters out analysis channel events and only preserves normal text. This can cause the content to be empty when explanations are in the analysis channel. To work around this, complete the tool round by returning tool results as role=\"tool\" messages, which enables the model to generate the final content."
   ]
  },
  {
@@ -354,142 +355,6 @@
    "print(final_response.choices[0].message.content)"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Tool Choice Mode\n",
-    "\n",
-    "SGLang supports OpenAI's `tool_choice` parameter to control when and which tools the model should call. This feature is implemented using EBNF (Extended Backus-Naur Form) grammar to ensure reliable tool calling behavior.\n",
-    "\n",
-    "### Supported Tool Choice Options\n",
-    "\n",
-    "- **`tool_choice=\"required\"`**: Forces the model to call at least one tool\n",
-    "- **`tool_choice={\"type\": \"function\", \"function\": {\"name\": \"specific_function\"}}`**: Forces the model to call a specific function\n",
-    "\n",
-    "### Backend Compatibility\n",
-    "\n",
-    "Tool choice is fully supported with the **Xgrammar backend**, which is the default grammar backend (`--grammar-backend xgrammar`). However, it may not be fully supported with other backends such as `outlines`.\n",
-    "\n",
-    "### Example: Required Tool Choice"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from openai import OpenAI\n",
-    "from sglang.utils import wait_for_server, print_highlight, terminate_process\n",
-    "from sglang.test.doc_patch import launch_server_cmd\n",
-    "\n",
-    "# Start a new server session for tool choice examples\n",
-    "server_process_tool_choice, port_tool_choice = launch_server_cmd(\n",
-    "    \"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --tool-call-parser qwen25 --host 0.0.0.0\"\n",
-    ")\n",
-    "wait_for_server(f\"http://localhost:{port_tool_choice}\")\n",
-    "\n",
-    "# Initialize client for tool choice examples\n",
-    "client_tool_choice = OpenAI(\n",
-    "    api_key=\"None\", base_url=f\"http://0.0.0.0:{port_tool_choice}/v1\"\n",
-    ")\n",
-    "model_name_tool_choice = client_tool_choice.models.list().data[0].id\n",
-    "\n",
-    "# Example with tool_choice=\"required\" - forces the model to call a tool\n",
-    "messages_required = [\n",
-    "    {\"role\": \"user\", \"content\": \"Hello, what is the capital of France?\"}\n",
-    "]\n",
-    "\n",
-    "# Define tools\n",
-    "tools = [\n",
-    "    {\n",
-    "        \"type\": \"function\",\n",
-    "        \"function\": {\n",
-    "            \"name\": \"get_current_weather\",\n",
-    "            \"description\": \"Get the current weather in a given location\",\n",
-    "            \"parameters\": {\n",
-    "                \"type\": \"object\",\n",
-    "                \"properties\": {\n",
-    "                    \"city\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"The city to find the weather for, e.g. 'San Francisco'\",\n",
-    "                    },\n",
-    "                    \"unit\": {\n",
-    "                        \"type\": \"string\",\n",
-    "                        \"description\": \"The unit to fetch the temperature in\",\n",
-    "                        \"enum\": [\"celsius\", \"fahrenheit\"],\n",
-    "                    },\n",
-    "                },\n",
-    "                \"required\": [\"city\", \"unit\"],\n",
-    "            },\n",
-    "        },\n",
-    "    }\n",
-    "]\n",
-    "\n",
-    "response_required = client_tool_choice.chat.completions.create(\n",
-    "    model=model_name_tool_choice,\n",
-    "    messages=messages_required,\n",
-    "    temperature=0,\n",
-    "    max_tokens=1024,\n",
-    "    tools=tools,\n",
-    "    tool_choice=\"required\",  # Force the model to call a tool\n",
-    ")\n",
-    "\n",
-    "print_highlight(\"Response with tool_choice='required':\")\n",
-    "print(\"Content:\", response_required.choices[0].message.content)\n",
-    "print(\"Tool calls:\", response_required.choices[0].message.tool_calls)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Example: Specific Function Choice\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Example with specific function choice - forces the model to call a specific function\n",
-    "messages_specific = [\n",
-    "    {\"role\": \"user\", \"content\": \"What are the most attactive places in France?\"}\n",
-    "]\n",
-    "\n",
-    "response_specific = client_tool_choice.chat.completions.create(\n",
-    "    model=model_name_tool_choice,\n",
-    "    messages=messages_specific,\n",
-    "    temperature=0,\n",
-    "    max_tokens=1024,\n",
-    "    tools=tools,\n",
-    "    tool_choice={\n",
-    "        \"type\": \"function\",\n",
-    "        \"function\": {\"name\": \"get_current_weather\"},\n",
-    "    },  # Force the model to call the specific get_current_weather function\n",
-    ")\n",
-    "\n",
-    "print_highlight(\"Response with specific function choice:\")\n",
-    "print(\"Content:\", response_specific.choices[0].message.content)\n",
-    "print(\"Tool calls:\", response_specific.choices[0].message.tool_calls)\n",
-    "\n",
-    "if response_specific.choices[0].message.tool_calls:\n",
-    "    tool_call = response_specific.choices[0].message.tool_calls[0]\n",
-    "    print(f\"Called function: {tool_call.function.name}\")\n",
-    "    print(f\"Arguments: {tool_call.function.arguments}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "terminate_process(server_process_tool_choice)"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -583,6 +448,9 @@
    "    messages, tokenize=True, add_generation_prompt=True, tools=tools\n",
    ")\n",
    "\n",
+    "# Note that for gpt-oss tool parser, adding \"no_stop_trim\": True\n",
+    "# to make sure the tool call token <call> is not trimmed.\n",
+    "\n",
    "sampling_params = {\n",
    "    \"max_new_tokens\": 1024,\n",
    "    \"temperature\": 0,\n",
@@ -636,6 +504,142 @@
    "llm.shutdown()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tool Choice Mode\n",
+    "\n",
+    "SGLang supports OpenAI's `tool_choice` parameter to control when and which tools the model should call. This feature is implemented using EBNF (Extended Backus-Naur Form) grammar to ensure reliable tool calling behavior.\n",
+    "\n",
+    "### Supported Tool Choice Options\n",
+    "\n",
+    "- **`tool_choice=\"required\"`**: Forces the model to call at least one tool\n",
+    "- **`tool_choice={\"type\": \"function\", \"function\": {\"name\": \"specific_function\"}}`**: Forces the model to call a specific function\n",
+    "\n",
+    "### Backend Compatibility\n",
+    "\n",
+    "Tool choice is fully supported with the **Xgrammar backend**, which is the default grammar backend (`--grammar-backend xgrammar`). However, it may not be fully supported with other backends such as `outlines`.\n",
+    "\n",
+    "### Example: Required Tool Choice"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "from sglang.utils import wait_for_server, print_highlight, terminate_process\n",
+    "from sglang.test.doc_patch import launch_server_cmd\n",
+    "\n",
+    "# Start a new server session for tool choice examples\n",
+    "server_process_tool_choice, port_tool_choice = launch_server_cmd(\n",
+    "    \"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-7B-Instruct --tool-call-parser qwen25 --host 0.0.0.0\"\n",
+    ")\n",
+    "wait_for_server(f\"http://localhost:{port_tool_choice}\")\n",
+    "\n",
+    "# Initialize client for tool choice examples\n",
+    "client_tool_choice = OpenAI(\n",
+    "    api_key=\"None\", base_url=f\"http://0.0.0.0:{port_tool_choice}/v1\"\n",
+    ")\n",
+    "model_name_tool_choice = client_tool_choice.models.list().data[0].id\n",
+    "\n",
+    "# Example with tool_choice=\"required\" - forces the model to call a tool\n",
+    "messages_required = [\n",
+    "    {\"role\": \"user\", \"content\": \"Hello, what is the capital of France?\"}\n",
+    "]\n",
+    "\n",
+    "# Define tools\n",
+    "tools = [\n",
+    "    {\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\n",
+    "            \"name\": \"get_current_weather\",\n",
+    "            \"description\": \"Get the current weather in a given location\",\n",
+    "            \"parameters\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"city\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The city to find the weather for, e.g. 'San Francisco'\",\n",
+    "                    },\n",
+    "                    \"unit\": {\n",
+    "                        \"type\": \"string\",\n",
+    "                        \"description\": \"The unit to fetch the temperature in\",\n",
+    "                        \"enum\": [\"celsius\", \"fahrenheit\"],\n",
+    "                    },\n",
+    "                },\n",
+    "                \"required\": [\"city\", \"unit\"],\n",
+    "            },\n",
+    "        },\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "response_required = client_tool_choice.chat.completions.create(\n",
+    "    model=model_name_tool_choice,\n",
+    "    messages=messages_required,\n",
+    "    temperature=0,\n",
+    "    max_tokens=1024,\n",
+    "    tools=tools,\n",
+    "    tool_choice=\"required\",  # Force the model to call a tool\n",
+    ")\n",
+    "\n",
+    "print_highlight(\"Response with tool_choice='required':\")\n",
+    "print(\"Content:\", response_required.choices[0].message.content)\n",
+    "print(\"Tool calls:\", response_required.choices[0].message.tool_calls)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Example: Specific Function Choice\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example with specific function choice - forces the model to call a specific function\n",
+    "messages_specific = [\n",
+    "    {\"role\": \"user\", \"content\": \"What are the most attactive places in France?\"}\n",
+    "]\n",
+    "\n",
+    "response_specific = client_tool_choice.chat.completions.create(\n",
+    "    model=model_name_tool_choice,\n",
+    "    messages=messages_specific,\n",
+    "    temperature=0,\n",
+    "    max_tokens=1024,\n",
+    "    tools=tools,\n",
+    "    tool_choice={\n",
+    "        \"type\": \"function\",\n",
+    "        \"function\": {\"name\": \"get_current_weather\"},\n",
+    "    },  # Force the model to call the specific get_current_weather function\n",
+    ")\n",
+    "\n",
+    "print_highlight(\"Response with specific function choice:\")\n",
+    "print(\"Content:\", response_specific.choices[0].message.content)\n",
+    "print(\"Tool calls:\", response_specific.choices[0].message.tool_calls)\n",
+    "\n",
+    "if response_specific.choices[0].message.tool_calls:\n",
+    "    tool_call = response_specific.choices[0].message.tool_calls[0]\n",
+    "    print(f\"Called function: {tool_call.function.name}\")\n",
+    "    print(f\"Arguments: {tool_call.function.arguments}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "terminate_process(server_process_tool_choice)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -657,6 +661,8 @@
    "\n",
    "For more information, refer to Meta’s documentation on  [Zero shot function calling](https://github.com/meta-llama/llama-models/blob/main/models/llama4/prompt_format.md#zero-shot-function-calling---system-message).\n",
    "\n",
+    "Note that this feature is still under development on Blackwell.\n",
+    "\n",
    "### How to enable\n",
    "- Launch the server with `--tool-call-parser pythonic`\n",
    "- You may also specify --chat-template with the improved template for the model (e.g., `--chat-template=examples/chat_template/tool_chat_template_llama4_pythonic.jinja`).\n",

--- a/docs/advanced_features/separate_reasoning.ipynb
+++ b/docs/advanced_features/separate_reasoning.ipynb
@@ -17,7 +17,7 @@
    "| [Standard Qwen3 models](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `<think>` … `</think>` | `qwen3` | Supports `enable_thinking` parameter |\n",
    "| [Qwen3-Thinking models](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507) | `<think>` … `</think>` | `qwen3` or `qwen3-thinking` | Always generates thinking content |\n",
    "| [Kimi models](https://huggingface.co/moonshotai/models) | `◁think▷` … `◁/think▷` | `kimi` | Uses special thinking delimiters |\n",
-    "\n",
+    "| [GPT OSS](https://huggingface.co/openai/gpt-oss-120b) | `<\\|channel\\|>analysis<\\|message\\|>` … `<\\|end\\|>` | `gpt-oss` | N/A |\n",
    "### Model-Specific Behaviors\n",
    "\n",
    "**DeepSeek-R1 Family:**\n",

--- a/docs/advanced_features/vlm_query.ipynb
+++ b/docs/advanced_features/vlm_query.ipynb
--- a/docs/basic_usage/gpt_oss.md
+++ b/docs/basic_usage/gpt_oss.md
@@ -23,6 +23,11 @@ GPT‑OSS can call built‑in tools for web search and Python execution. You can
 - Uses the Exa backend for web search.
 - Requires an Exa API key; set `EXA_API_KEY` in your environment. Create a key at `https://exa.ai`.

+### Tool & Reasoning Parser
+
+- We support OpenAI Reasoning and Tool Call parser, as well as our SGLang native api for tool call and reasoning. Refer to [reasoning parser](../advanced_features/separate_reasoning.ipynb) and [tool call parser](../advanced_features/function_calling.ipynb) for more details.
+
+
 ## Notes

 - Use **Python 3.12** for the demo tools. And install the required `gpt-oss` packages.

--- a/scripts/playground/frontend_reasoning.ipynb
+++ b/scripts/playground/frontend_reasoning.ipynb