[Docs] Add Support for Pydantic Structured Output Format (#2697)

062c48d2 · Shi Shuai · GitHub · b6e0cfb5 · 062c48d2 · 062c48d2
Unverified Commit 062c48d2 authored Jan 01, 2025 by Shi Shuai Committed by GitHub Jan 01, 2025
4 changed files
--- a/docs/backend/backend.md
+++ b/docs/backend/backend.md
-# Backend: SGLang Runtime (SRT)
+# Server Arguments
-The SGLang Runtime (SRT) is an efficient serving engine.
-## Quick Start
-Launch a server
-```
-python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --port 30000
-```
-Send a request
-```
-curl http://localhost:30000/generate \
-  -H "Content-Type: application/json" \
-  -d '{
-    "text": "Once upon a time,",
-    "sampling_params": {
-      "max_new_tokens": 16,
-      "temperature": 0
-    }
-  }'
-```
-Learn more about the argument specification, streaming, and multi-modal support [here](../references/sampling_params.md).
-## OpenAI Compatible API
-In addition, the server supports OpenAI-compatible APIs.
-```python
-import openai
-client = openai.Client(
-    base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")
-# Text completion
-response = client.completions.create(
-	model="default",
-	prompt="The capital of France is",
-	temperature=0,
-	max_tokens=32,
-)
-print(response)
-# Chat completion
-response = client.chat.completions.create(
-    model="default",
-    messages=[
-        {"role": "system", "content": "You are a helpful AI assistant"},
-        {"role": "user", "content": "List 3 countries and their capitals."},
-    ],
-    temperature=0,
-    max_tokens=64,
-)
-print(response)
-# Text embedding
-response = client.embeddings.create(
-    model="default",
-    input="How are you today",
-)
-print(response)
-```
-It supports streaming, vision, and almost all features of the Chat/Completions/Models/Batch endpoints specified by the [OpenAI API Reference](https://platform.openai.com/docs/api-reference/).
-## Additional Server Arguments
 - To enable multi-GPU tensor parallelism, add `--tp 2`. If it reports the error "peer access is not supported between these two devices", add `--enable-p2p-check` to the server launch command.
 ```
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 2
@@ -94,35 +32,6 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --nccl-init sgl-dev-0:50000 --nnodes 2 --node-rank 1
 ```
-## Engine Without HTTP Server
-We also provide an inference engine **without a HTTP server**. For example,
-```python
-import sglang as sgl
-def main():
-    prompts = [
-        "Hello, my name is",
-        "The president of the United States is",
-        "The capital of France is",
-        "The future of AI is",
-    ]
-    sampling_params = {"temperature": 0.8, "top_p": 0.95}
-    llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")
-    outputs = llm.generate(prompts, sampling_params)
-    for prompt, output in zip(prompts, outputs):
-        print("===============================")
-        print(f"Prompt: {prompt}\nGenerated text: {output['text']}")
-if __name__ == "__main__":
-    main()
-```
-This can be used for offline batch inference and building custom servers.
-You can view the full example [here](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine).
 ## Use Models From ModelScope
 <details>
 <summary>More</summary>

--- a/docs/backend/structured_outputs.ipynb
+++ b/docs/backend/structured_outputs.ipynb
@@ -16,16 +16,11 @@
    "SGLang supports two grammar backends:\n",
    "\n",
    "- [Outlines](https://github.com/dottxt-ai/outlines) (default): Supports JSON schema and regular expression constraints.\n",
-    "- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints.\n",
+    "- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints and currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md).\n",
-    "  - XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)\n",
    "\n",
-    "Initialize the XGrammar backend using `--grammar-backend xgrammar` flag\n",
+    "We suggest using XGrammar whenever possible for its better performance. For more details, see [XGrammar technical overview](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar).\n",
-    "```bash\n",
-    "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
-    "--port 30000 --host 0.0.0.0 --grammar-backend [xgrammar|outlines] # xgrammar or outlines (default: outlines)\n",
-    "```\n",
    "\n",
-    "We suggest using XGrammar whenever possible for its better performance. For more details, see [XGrammar technical overview](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar)."
+    "To use Xgrammar, simply add `--grammar-backend` xgrammar when launching the server. If no backend is specified, Outlines will be used as the default."
   ]
  },
  {
@@ -35,13 +30,6 @@
    "## OpenAI Compatible API"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To use Xgrammar, simply add `--grammar-backend xgrammar` when launching the server. If no backend is specified, Outlines will be used as the default."
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -68,7 +56,64 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### JSON"
+    "### JSON\n",
+    "\n",
+    "you can directly define a JSON schema or use [Pydantic](https://docs.pydantic.dev/latest/) to define and validate the response."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Using Pydantic**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pydantic import BaseModel, Field\n",
+    "\n",
+    "\n",
+    "# Define the schema using Pydantic\n",
+    "class CapitalInfo(BaseModel):\n",
+    "    name: str = Field(..., pattern=r\"^\\w+$\", description=\"Name of the capital city\")\n",
+    "    population: int = Field(..., description=\"Population of the capital city\")\n",
+    "\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
+    "    messages=[\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Give me the information of the capital of France in the JSON format.\",\n",
+    "        },\n",
+    "    ],\n",
+    "    temperature=0,\n",
+    "    max_tokens=128,\n",
+    "    response_format={\n",
+    "        \"type\": \"json_schema\",\n",
+    "        \"json_schema\": {\n",
+    "            \"name\": \"foo\",\n",
+    "            # convert the pydantic model to json schema\n",
+    "            \"schema\": CapitalInfo.model_json_schema(),\n",
+    "        },\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "response_content = response.choices[0].message.content\n",
+    "# validate the JSON response by the pydantic model\n",
+    "capital_info = CapitalInfo.model_validate_json(response_content)\n",
+    "print_highlight(f\"Validated response: {capital_info.model_dump_json()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**JSON Schema Directly**\n"
   ]
  },
  {
@@ -225,15 +270,64 @@
    "### JSON"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Using Pydantic**"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
-    "import json\n",
    "import requests\n",
+    "import json\n",
+    "from pydantic import BaseModel, Field\n",
+    "\n",
+    "\n",
+    "# Define the schema using Pydantic\n",
+    "class CapitalInfo(BaseModel):\n",
+    "    name: str = Field(..., pattern=r\"^\\w+$\", description=\"Name of the capital city\")\n",
+    "    population: int = Field(..., description=\"Population of the capital city\")\n",
+    "\n",
+    "\n",
+    "# Make API request\n",
+    "response = requests.post(\n",
+    "    \"http://localhost:30010/generate\",\n",
+    "    json={\n",
+    "        \"text\": \"Here is the information of the capital of France in the JSON format.\\n\",\n",
+    "        \"sampling_params\": {\n",
+    "            \"temperature\": 0,\n",
+    "            \"max_new_tokens\": 64,\n",
+    "            \"json_schema\": json.dumps(CapitalInfo.model_json_schema()),\n",
+    "        },\n",
+    "    },\n",
+    ")\n",
+    "print_highlight(response.json())\n",
+    "\n",
    "\n",
+    "response_data = json.loads(response.json()[\"text\"])\n",
+    "# validate the response by the pydantic model\n",
+    "capital_info = CapitalInfo.model_validate(response_data)\n",
+    "print_highlight(f\"Validated response: {capital_info.model_dump_json()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**JSON Schema Directly**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
    "json_schema = json.dumps(\n",
    "    {\n",
    "        \"type\": \"object\",\n",
@@ -379,6 +473,13 @@
    "### JSON"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Using Pydantic**"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -386,7 +487,49 @@
   "outputs": [],
   "source": [
    "import json\n",
+    "from pydantic import BaseModel, Field\n",
+    "\n",
    "\n",
+    "prompts = [\n",
+    "    \"Give me the information of the capital of China in the JSON format.\",\n",
+    "    \"Give me the information of the capital of France in the JSON format.\",\n",
+    "    \"Give me the information of the capital of Ireland in the JSON format.\",\n",
+    "]\n",
+    "\n",
+    "\n",
+    "# Define the schema using Pydantic\n",
+    "class CapitalInfo(BaseModel):\n",
+    "    name: str = Field(..., pattern=r\"^\\w+$\", description=\"Name of the capital city\")\n",
+    "    population: int = Field(..., description=\"Population of the capital city\")\n",
+    "\n",
+    "\n",
+    "sampling_params = {\n",
+    "    \"temperature\": 0.1,\n",
+    "    \"top_p\": 0.95,\n",
+    "    \"json_schema\": json.dumps(CapitalInfo.model_json_schema()),\n",
+    "}\n",
+    "\n",
+    "outputs = llm_xgrammar.generate(prompts, sampling_params)\n",
+    "for prompt, output in zip(prompts, outputs):\n",
+    "    print_highlight(\"===============================\")\n",
+    "    print_highlight(f\"Prompt: {prompt}\")  # validate the output by the pydantic model\n",
+    "    capital_info = CapitalInfo.model_validate_json(output[\"text\"])\n",
+    "    print_highlight(f\"Validated output: {capital_info.model_dump_json()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**JSON Schema Directly**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
    "prompts = [\n",
    "    \"Give me the information of the capital of China in the JSON format.\",\n",
    "    \"Give me the information of the capital of France in the JSON format.\",\n",

--- a/docs/index.rst
+++ b/docs/index.rst
@@ -29,7 +29,7 @@ The core features include:
   backend/native_api.ipynb
   backend/offline_engine_api.ipynb
   backend/structured_outputs.ipynb
-   backend/backend.md
+   backend/server_arguments.md
 .. toctree::

--- a/docs/references/contribution_guide.md
+++ b/docs/references/contribution_guide.md
@@ -2,9 +2,9 @@
 Welcome to **SGLang**! We appreciate your interest in contributing. This guide provides a concise overview of how to set up your environment, run tests, build documentation, and open a Pull Request (PR). Whether you’re fixing a small bug or developing a major feature, we encourage following these steps for a smooth contribution process.
-## 1. Setting Up & Building from Source
+## Setting Up & Building from Source
-### 1.1 Fork and Clone the Repository
+### Fork and Clone the Repository
 **Note**: SGLang does **not** accept PRs on the main repo. Please fork the repository under your GitHub account, then clone your fork locally.
@@ -13,7 +13,7 @@ git clone https://github.com/<your_user_name>/sglang.git
 cd sglang
 ```
-### 1.2 Install Dependencies & Build
+### Install Dependencies & Build
 Refer to [Install SGLang](https://sgl-project.github.io/start/install.html) documentation for more details on setting up the necessary dependencies.
@@ -32,7 +32,7 @@ cd sglang/python
 pip install .
 ```
-## 2. Code Formatting with Pre-Commit
+## Code Formatting with Pre-Commit
 We use [pre-commit](https://pre-commit.com/) to maintain consistent code style checks. Before pushing your changes, please run:
@@ -45,11 +45,11 @@ pre-commit run --all-files
 - **`pre-commit run --all-files`** manually runs all configured checks, applying fixes if possible. If it fails the first time, re-run it to ensure lint errors are fully resolved. Make sure your code passes all checks **before** creating a Pull Request.
 - **Do not commit** directly to the `main` branch. Always create a new branch (e.g., `feature/my-new-feature`), push your changes, and open a PR from that branch.
-## 3. Writing Documentation & Running Docs CI
+## Writing Documentation & Running Docs CI
 Most documentation files are located under the `docs/` folder. We prefer **Jupyter Notebooks** over Markdown so that all examples can be executed and validated by our docs CI pipeline.
-### 3.1 Docs Workflow
+### Docs Workflow
 Add or update your Jupyter notebooks in the appropriate subdirectories under `docs/`. If you add new files, remember to update `index.rst` (or relevant `.rst` files) accordingly.
@@ -114,11 +114,11 @@ llm.shutdown()
 ```
-## 4. Running Unit Tests & Adding to CI
+## Running Unit Tests & Adding to CI
 SGLang uses Python’s built-in [unittest](https://docs.python.org/3/library/unittest.html) framework. You can run tests either individually or in suites.
-### 4.1 Test Backend Runtime
+### Test Backend Runtime
 ```bash
 cd sglang/test/srt
@@ -133,7 +133,7 @@ python3 -m unittest test_srt_endpoint.TestSRTEndpoint.test_simple_decode
 python3 run_suite.py --suite minimal
 ```
-### 4.2 Test Frontend Language
+### Test Frontend Language
 ```bash
 cd sglang/test/lang
@@ -149,13 +149,13 @@ python3 -m unittest test_openai_backend.TestOpenAIBackend.test_few_shot_qa
 python3 run_suite.py --suite minimal
 ```
-### 4.3 Adding or Updating Tests in CI
+### Adding or Updating Tests in CI
 - Create new test files under `test/srt` or `test/lang` depending on the type of test.
 - Ensure they are referenced in the respective `run_suite.py` (e.g., `test/srt/run_suite.py` or `test/lang/run_suite.py`) so they’re picked up in CI.
 - In CI, all tests run automatically. You may modify the workflows in [`.github/workflows/`](https://github.com/sgl-project/sglang/tree/main/.github/workflows) to add custom test groups or extra checks.
-### 4.4 Writing Elegant Test Cases
+### Writing Elegant Test Cases
 - Examine existing tests in [sglang/test](https://github.com/sgl-project/sglang/tree/main/test) for practical examples.
 - Keep each test function focused on a single scenario or piece of functionality.
@@ -164,7 +164,7 @@ python3 run_suite.py --suite minimal
 - Clean up resources to avoid side effects and preserve test independence.
-## 5. Tips for Newcomers
+## Tips for Newcomers
 If you want to contribute but don’t have a specific idea in mind, pick issues labeled [“good first issue” or “help wanted”](https://github.com/sgl-project/sglang/issues?q=is%3Aissue+label%3A%22good+first+issue%22%2C%22help+wanted%22). These tasks typically have lower complexity and provide an excellent introduction to the codebase. Also check out this [code walk-through](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/tree/main/sglang/code-walk-through) for a deeper look into SGLang’s workflow.