Unverified Commit 9c6e25d2 authored by ybyang's avatar ybyang Committed by GitHub
Browse files

doc for logit_bias (#12188)

parent 2a3763c3
......@@ -164,6 +164,48 @@
"**Note:** Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`. Qwen3-Thinking models always generate reasoning content and don't support the `enable_thinking` parameter.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Logit Bias Support\n",
"\n",
"SGLang supports the `logit_bias` parameter for both chat completions and completions APIs. This parameter allows you to modify the likelihood of specific tokens being generated by adding bias values to their logits. The bias values can range from -100 to 100, where:\n",
"\n",
"- **Positive values** (0 to 100) increase the likelihood of the token being selected\n",
"- **Negative values** (-100 to 0) decrease the likelihood of the token being selected\n",
"- **-100** effectively prevents the token from being generated\n",
"\n",
"The `logit_bias` parameter accepts a dictionary where keys are token IDs (as strings) and values are the bias amounts (as floats).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Getting Token IDs\n",
"\n",
"To use `logit_bias` effectively, you need to know the token IDs for the words you want to bias. Here's how to get token IDs:\n",
"\n",
"```python\n",
"# Get tokenizer to find token IDs\n",
"import tiktoken\n",
"\n",
"# For OpenAI models, use the appropriate encoding\n",
"tokenizer = tiktoken.encoding_for_model(\"gpt-3.5-turbo\") # or your model\n",
"\n",
"# Get token IDs for specific words\n",
"word = \"sunny\"\n",
"token_ids = tokenizer.encode(word)\n",
"print(f\"Token IDs for '{word}': {token_ids}\")\n",
"\n",
"# For SGLang models, you can access the tokenizer through the client\n",
"# and get token IDs for bias\n",
"```\n",
"\n",
"**Important:** The `logit_bias` parameter uses token IDs as string keys, not the actual words.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -225,6 +267,32 @@
"**Note:** DeepSeek-V3 models use the `thinking` parameter (not `enable_thinking`) to control reasoning output.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example with logit_bias parameter\n",
"# Note: You need to get the actual token IDs from your tokenizer\n",
"# For demonstration, we'll use some example token IDs\n",
"response = client.chat.completions.create(\n",
" model=\"qwen/qwen2.5-0.5b-instruct\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": \"Complete this sentence: The weather today is\"}\n",
" ],\n",
" temperature=0.7,\n",
" max_tokens=20,\n",
" logit_bias={\n",
" \"12345\": 50, # Increase likelihood of token ID 12345\n",
" \"67890\": -50, # Decrease likelihood of token ID 67890\n",
" \"11111\": 25, # Slightly increase likelihood of token ID 11111\n",
" },\n",
")\n",
"\n",
"print_highlight(f\"Response with logit bias: {response.choices[0].message.content}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -275,6 +343,15 @@
"Streaming mode is also supported."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Logit Bias Support\n",
"\n",
"The completions API also supports the `logit_bias` parameter with the same functionality as described in the chat completions section above.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
......@@ -291,6 +368,30 @@
" print(chunk.choices[0].delta.content, end=\"\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example with logit_bias parameter for completions API\n",
"# Note: You need to get the actual token IDs from your tokenizer\n",
"# For demonstration, we'll use some example token IDs\n",
"response = client.completions.create(\n",
" model=\"qwen/qwen2.5-0.5b-instruct\",\n",
" prompt=\"The best programming language for AI is\",\n",
" temperature=0.7,\n",
" max_tokens=20,\n",
" logit_bias={\n",
" \"12345\": 75, # Strongly favor token ID 12345\n",
" \"67890\": -100, # Completely avoid token ID 67890\n",
" \"11111\": -25, # Slightly discourage token ID 11111\n",
" },\n",
")\n",
"\n",
"print_highlight(f\"Response with logit bias: {response.choices[0].text}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment