Unverified Commit fd34f2da authored by Adarsh Shirawalmath's avatar Adarsh Shirawalmath Committed by GitHub
Browse files

[Docs] Add EBNF to sampling params docs (#2609)

parent 8ee9a850
...@@ -220,14 +220,21 @@ ...@@ -220,14 +220,21 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Structured decoding (JSON, Regex)\n", "## Structured Outputs (JSON, Regex, EBNF)\n",
"You can define a JSON schema or regular expression to constrain the model's output. The model output will be guaranteed to follow the given constraints and this depends on the grammar backend.\n", "You can specify a JSON schema, Regular Expression or [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) to constrain the model output. The model output will be guaranteed to follow the given constraints. \n",
"\n", "\n",
"SGlang has two backends: [Outlines](https://github.com/dottxt-ai/outlines) (default) and [XGrammar](https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar). Xgrammar accelerates JSON decoding performance but does not support regular expressions. To use Xgrammar, add the `--grammar-backend xgrammar` when launching the server:\n", "SGLang supports two grammar backends:\n",
"\n", "\n",
"- [Outlines](https://github.com/dottxt-ai/outlines) (default): Supports JSON schema and Regular Expression constraints.\n",
"- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints.\n",
" - XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)\n",
"\n",
"> 🔔 Only one constraint parameter (`json_schema`, `regex`, or `ebnf`) can be specified at a time.\n",
"\n",
"Initialise xgrammar backend using `--grammar-backend xgrammar` flag\n",
"```bash\n", "```bash\n",
"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n", "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n",
"--port 30000 --host 0.0.0.0 --grammar-backend xgrammar\n", "--port 30000 --host 0.0.0.0 --grammar-backend [xgrammar|outlines] # xgrammar or outlines (default: outlines)\n",
"```\n", "```\n",
"\n", "\n",
"### JSON" "### JSON"
...@@ -275,7 +282,7 @@ ...@@ -275,7 +282,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Regular expression" "### Regular expression (use default \"outlines\" backend)"
] ]
}, },
{ {
...@@ -297,6 +304,46 @@ ...@@ -297,6 +304,46 @@
"print_highlight(response.choices[0].message.content)" "print_highlight(response.choices[0].message.content)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### EBNF (use \"xgrammar\" backend)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# terminate the existing server(that's using default outlines backend) for this demo\n",
"terminate_process(server_process)\n",
"\n",
"# start new server with xgrammar backend\n",
"server_process = execute_shell_command(\n",
" \"python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0 --grammar-backend xgrammar\"\n",
")\n",
"wait_for_server(\"http://localhost:30000\")\n",
"\n",
"# EBNF example\n",
"ebnf_grammar = r\"\"\"\n",
" root ::= \"Hello\" | \"Hi\" | \"Hey\"\n",
" \"\"\"\n",
"response = client.chat.completions.create(\n",
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are a helpful EBNF test bot.\"},\n",
" {\"role\": \"user\", \"content\": \"Say a greeting.\"},\n",
" ],\n",
" temperature=0,\n",
" max_tokens=32,\n",
" extra_body={\"ebnf\": ebnf_grammar},\n",
")\n",
"\n",
"print_highlight(response.choices[0].message.content)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
......
...@@ -58,13 +58,18 @@ ignore_eos: bool = False, ...@@ -58,13 +58,18 @@ ignore_eos: bool = False,
skip_special_tokens: bool = True, skip_special_tokens: bool = True,
# Whether to add spaces between special tokens during detokenization. # Whether to add spaces between special tokens during detokenization.
spaces_between_special_tokens: bool = True, spaces_between_special_tokens: bool = True,
# Constrains the output to follow a given regular expression.
regex: Optional[str] = None,
# Do parallel sampling and return `n` outputs. # Do parallel sampling and return `n` outputs.
n: int = 1, n: int = 1,
## Structured Outputs
# Only one of the below three can be set at a time:
# Constrains the output to follow a given regular expression.
regex: Optional[str] = None,
# Constrains the output to follow a given JSON schema. # Constrains the output to follow a given JSON schema.
# `regex` and `json_schema` cannot be set at the same time.
json_schema: Optional[str] = None, json_schema: Optional[str] = None,
# Constrains the output to follow a given EBNF Grammar.
ebnf: Optional[str] = None,
## Penalties. See [Performance Implications on Penalties] section below for more informations. ## Penalties. See [Performance Implications on Penalties] section below for more informations.
...@@ -179,25 +184,37 @@ print(response.json()) ...@@ -179,25 +184,37 @@ print(response.json())
The `image_data` can be a file name, a URL, or a base64 encoded string. See also `python/sglang/srt/utils.py:load_image`. The `image_data` can be a file name, a URL, or a base64 encoded string. See also `python/sglang/srt/utils.py:load_image`.
Streaming is supported in a similar manner as [above](#streaming). Streaming is supported in a similar manner as [above](#streaming).
### Structured decoding (JSON, Regex) ### Structured Outputs (JSON, Regex, EBNF)
You can specify a JSON schema or a regular expression to constrain the model output. The model output will be guaranteed to follow the given constraints. You can specify a JSON schema, Regular Expression or [EBNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) to constrain the model output. The model output will be guaranteed to follow the given constraints.
SGLang supports two grammar backends:
- [Outlines](https://github.com/dottxt-ai/outlines) (default): Supports JSON schema and Regular Expression constraints.
- [XGrammar](https://github.com/mlc-ai/xgrammar): Supports JSON schema and EBNF constraints.
- XGrammar currently uses the [GGML BNF format](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md)
> 🔔 Only one constraint parameter (`json_schema`, `regex`, or `ebnf`) can be specified at a time.
Initialise xgrammar backend using `--grammar-backend xgrammar` flag
```bash
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--port 30000 --host 0.0.0.0 --grammar-backend [xgrammar|outlines] # xgrammar or outlines (default: outlines)
```
```python ```python
import json import json
import requests import requests
json_schema = json.dumps( json_schema = json.dumps({
{
"type": "object", "type": "object",
"properties": { "properties": {
"name": {"type": "string", "pattern": "^[\\w]+$"}, "name": {"type": "string", "pattern": "^[\\w]+$"},
"population": {"type": "integer"}, "population": {"type": "integer"},
}, },
"required": ["name", "population"], "required": ["name", "population"],
} })
)
# JSON # JSON (works with both Outlines and XGrammar)
response = requests.post( response = requests.post(
"http://localhost:30000/generate", "http://localhost:30000/generate",
json={ json={
...@@ -211,7 +228,7 @@ response = requests.post( ...@@ -211,7 +228,7 @@ response = requests.post(
) )
print(response.json()) print(response.json())
# Regular expression # Regular expression (Outlines backend only)
response = requests.post( response = requests.post(
"http://localhost:30000/generate", "http://localhost:30000/generate",
json={ json={
...@@ -224,4 +241,18 @@ response = requests.post( ...@@ -224,4 +241,18 @@ response = requests.post(
}, },
) )
print(response.json()) print(response.json())
# EBNF (XGrammar backend only)
response = requests.post(
"http://localhost:30000/generate",
json={
"text": "Write a greeting.",
"sampling_params": {
"temperature": 0,
"max_new_tokens": 64,
"ebnf": 'root ::= "Hello" | "Hi" | "Hey"',
},
},
)
print(response.json())
``` ```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment