{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick Start: Sending Requests\n", "This notebook provides a quick-start guide for using SGLang after installation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch a server\n", "This code block is equivalent to executing \n", "\n", "```bash\n", "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n", "--port 30000 --host 0.0.0.0\n", "```\n", "\n", "in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the [OpenAI-compatible APIs](https://platform.openai.com/docs/api-reference/chat)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:46:13.611212Z", "iopub.status.busy": "2024-11-01T02:46:13.611093Z", "iopub.status.idle": "2024-11-01T02:46:42.810261Z", "shell.execute_reply": "2024-11-01T02:46:42.809147Z" } }, "outputs": [], "source": [ "from sglang.utils import (\n", " execute_shell_command,\n", " wait_for_server,\n", " terminate_process,\n", " print_highlight,\n", ")\n", "\n", "server_process = execute_shell_command(\n", "\"\"\"\n", "python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n", "--port 30000 --host 0.0.0.0\n", "\"\"\"\n", ")\n", "\n", "wait_for_server(\"http://localhost:30000\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using cURL\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import subprocess, json\n", "\n", "curl_command = \"\"\"\n", "curl -s http://localhost:30000/v1/chat/completions \\\n", " -d '{\"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\", \"messages\": [{\"role\": \"user\", \"content\": \"What is the capital of France?\"}]}'\n", "\"\"\"\n", "\n", "response = json.loads(subprocess.check_output(curl_command, shell=True))\n", "print_highlight(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Python Requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:46:42.813656Z", "iopub.status.busy": "2024-11-01T02:46:42.813354Z", "iopub.status.idle": "2024-11-01T02:46:51.436613Z", "shell.execute_reply": "2024-11-01T02:46:51.435965Z" } }, "outputs": [], "source": [ "import requests\n", "\n", "url = \"http://localhost:30000/v1/chat/completions\"\n", "\n", "data = {\n", " \"model\": \"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " \"messages\": [\n", " {\"role\": \"user\", \"content\": \"What is the capital of France?\"}\n", " ]\n", "}\n", "\n", "response = requests.post(url, json=data)\n", "print_highlight(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using OpenAI Python Client" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:46:51.439372Z", "iopub.status.busy": "2024-11-01T02:46:51.439178Z", "iopub.status.idle": "2024-11-01T02:46:52.895776Z", "shell.execute_reply": "2024-11-01T02:46:52.895318Z" } }, "outputs": [], "source": [ "import openai\n", "\n", "client = openai.Client(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n", "\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n", " ],\n", " temperature=0,\n", " max_tokens=64,\n", ")\n", "print_highlight(response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Streaming" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import openai\n", "\n", "client = openai.Client(base_url=\"http://127.0.0.1:30000/v1\", api_key=\"None\")\n", "\n", "# Use stream=True for streaming responses\n", "response = client.chat.completions.create(\n", " model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n", " messages=[\n", " {\"role\": \"user\", \"content\": \"List 3 countries and their capitals.\"},\n", " ],\n", " temperature=0,\n", " max_tokens=64,\n", " stream=True,\n", ")\n", "\n", "# Handle the streaming output\n", "for chunk in response:\n", " if chunk.choices[0].delta.content:\n", " print(chunk.choices[0].delta.content, end='', flush=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Native Generation APIs\n", "\n", "You can also use the native `/generate` endpoint with requests, which provides more flexiblity. An API reference is available at [Sampling Parameters](https://sgl-project.github.io/references/sampling_params.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "response = requests.post(\n", " \"http://localhost:30000/generate\",\n", " json={\n", " \"text\": \"The capital of France is\",\n", " \"sampling_params\": {\n", " \"temperature\": 0,\n", " \"max_new_tokens\": 32,\n", " },\n", " },\n", ")\n", "\n", "print_highlight(response.json())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Streaming" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests, json\n", "\n", "response = requests.post(\n", " \"http://localhost:30000/generate\",\n", " json={\n", " \"text\": \"The capital of France is\",\n", " \"sampling_params\": {\n", " \"temperature\": 0,\n", " \"max_new_tokens\": 32,\n", " },\n", " \"stream\": True,\n", " },\n", " stream=True,\n", ")\n", "\n", "prev = 0\n", "for chunk in response.iter_lines(decode_unicode=False):\n", " chunk = chunk.decode(\"utf-8\")\n", " if chunk and chunk.startswith(\"data:\"):\n", " if chunk == \"data: [DONE]\":\n", " break\n", " data = json.loads(chunk[5:].strip(\"\\n\"))\n", " output = data[\"text\"]\n", " print(output[prev:], end=\"\", flush=True)\n", " prev = len(output)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:46:52.898411Z", "iopub.status.busy": "2024-11-01T02:46:52.898149Z", "iopub.status.idle": "2024-11-01T02:46:54.398382Z", "shell.execute_reply": "2024-11-01T02:46:54.397564Z" } }, "outputs": [], "source": [ "terminate_process(server_process)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }