doc for logit_bias (#12188)

9c6e25d2 · ybyang · GitHub · 2a3763c3 · 9c6e25d2
Unverified Commit 9c6e25d2 authored Oct 29, 2025 by ybyang Committed by GitHub Oct 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 101 additions and 0 deletions

docs/basic_usage/openai_api_completions.ipynb docs/basic_usage/openai_api_completions.ipynb +101 -0

No files found.
--- a/docs/basic_usage/openai_api_completions.ipynb
+++ b/docs/basic_usage/openai_api_completions.ipynb
@@ -164,6 +164,48 @@
    "**Note:** Setting `\"enable_thinking\": False` (or omitting it) will result in `reasoning_content` being `None`. Qwen3-Thinking models always generate reasoning content and don't support the `enable_thinking` parameter.\n"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Logit Bias Support\n",
+    "\n",
+    "SGLang supports the `logit_bias` parameter for both chat completions and completions APIs. This parameter allows you to modify the likelihood of specific tokens being generated by adding bias values to their logits. The bias values can range from -100 to 100, where:\n",
+    "\n",
+    "- **Positive values** (0 to 100) increase the likelihood of the token being selected\n",
+    "- **Negative values** (-100 to 0) decrease the likelihood of the token being selected\n",
+    "- **-100** effectively prevents the token from being generated\n",
+    "\n",
+    "The `logit_bias` parameter accepts a dictionary where keys are token IDs (as strings) and values are the bias amounts (as floats).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Getting Token IDs\n",
+    "\n",
+    "To use `logit_bias` effectively, you need to know the token IDs for the words you want to bias. Here's how to get token IDs:\n",
+    "\n",
+    "```python\n",
+    "# Get tokenizer to find token IDs\n",
+    "import tiktoken\n",
+    "\n",
+    "# For OpenAI models, use the appropriate encoding\n",
+    "tokenizer = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")  # or your model\n",
+    "\n",
+    "# Get token IDs for specific words\n",
+    "word = \"sunny\"\n",
+    "token_ids = tokenizer.encode(word)\n",
+    "print(f\"Token IDs for '{word}': {token_ids}\")\n",
+    "\n",
+    "# For SGLang models, you can access the tokenizer through the client\n",
+    "# and get token IDs for bias\n",
+    "```\n",
+    "\n",
+    "**Important:** The `logit_bias` parameter uses token IDs as string keys, not the actual words.\n"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -225,6 +267,32 @@
    "**Note:** DeepSeek-V3 models use the `thinking` parameter (not `enable_thinking`) to control reasoning output.\n"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example with logit_bias parameter\n",
+    "# Note: You need to get the actual token IDs from your tokenizer\n",
+    "# For demonstration, we'll use some example token IDs\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": \"Complete this sentence: The weather today is\"}\n",
+    "    ],\n",
+    "    temperature=0.7,\n",
+    "    max_tokens=20,\n",
+    "    logit_bias={\n",
+    "        \"12345\": 50,  # Increase likelihood of token ID 12345\n",
+    "        \"67890\": -50,  # Decrease likelihood of token ID 67890\n",
+    "        \"11111\": 25,  # Slightly increase likelihood of token ID 11111\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "print_highlight(f\"Response with logit bias: {response.choices[0].message.content}\")"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@@ -275,6 +343,15 @@
    "Streaming mode is also supported."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Logit Bias Support\n",
+    "\n",
+    "The completions API also supports the `logit_bias` parameter with the same functionality as described in the chat completions section above.\n"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -291,6 +368,30 @@
    "        print(chunk.choices[0].delta.content, end=\"\")"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Example with logit_bias parameter for completions API\n",
+    "# Note: You need to get the actual token IDs from your tokenizer\n",
+    "# For demonstration, we'll use some example token IDs\n",
+    "response = client.completions.create(\n",
+    "    model=\"qwen/qwen2.5-0.5b-instruct\",\n",
+    "    prompt=\"The best programming language for AI is\",\n",
+    "    temperature=0.7,\n",
+    "    max_tokens=20,\n",
+    "    logit_bias={\n",
+    "        \"12345\": 75,  # Strongly favor token ID 12345\n",
+    "        \"67890\": -100,  # Completely avoid token ID 67890\n",
+    "        \"11111\": -25,  # Slightly discourage token ID 11111\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "print_highlight(f\"Response with logit bias: {response.choices[0].text}\")"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},