{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Function Calling\n", "\n", "This notebook provides a quick-start guide to use function tooling using SGLang chat completions API\n", "\n", "## Supported Models\n", "\n", "Currently, we added the support for tools calling in the following models:\n", " - Llama 3.2 models\n", " - Llama 3.1 models\n", " - Qwen 2.5 models\n", " - InternLM Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Usage\n", "\n", "### Launch a server\n", "\n", "This code block is equivalent to executing\n", "\n", "`python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \\\n", "--port 30000 --host 0.0.0.0`\n", "in your terminal and wait for the server to be ready. Once the server is running, you can send test requests using curl or requests. The server implements the OpenAI-compatible APIs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sglang.utils import (\n", " execute_shell_command,\n", " wait_for_server,\n", " terminate_process,\n", " print_highlight,\n", ")\n", "\n", "\n", "server_process = execute_shell_command(\n", " \"\"\"\n", " python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30000 --host 0.0.0.0\n", "\"\"\"\n", ")\n", "\n", "wait_for_server(\"http://localhost:30000\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Single Round Invocation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "\n", "tools = [\n", " {\n", " \"type\": \"function\",\n", " \"function\": {\n", " \"name\": \"get_current_weather\",\n", " \"description\": \"Get the current weather in a given location\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"location\": {\n", " \"type\": \"string\",\n", " \"description\": \"The city and state, e.g. San Francisco, CA\",\n", " },\n", " \"unit\": {\"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"]},\n", " },\n", " \"required\": [\"location\"],\n", " },\n", " },\n", " }\n", "]\n", "messages = [{\"role\": \"user\", \"content\": \"What's the weather like in Boston today?\"}]\n", "\n", "client = OpenAI(api_key=\"YOUR_API_KEY\", base_url=\"http://0.0.0.0:30000/v1\")\n", "model_name = client.models.list().data[0].id\n", "response = client.chat.completions.create(\n", " model=model_name,\n", " messages=messages,\n", " temperature=0.8,\n", " top_p=0.8,\n", " stream=False,\n", " tools=tools,\n", ")\n", "\n", "print(response)\n", "\n", "\"\"\"\n", "\n", "ChatCompletion(id='d6f620e1767e490d85b5ce45c15151cf', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, \n", "role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{\"a\": \"3\", \"b\": \"5\"}', name='add'), type='function')]), \n", "matched_stop=128008)], created=1735411703, model='meta-llama/Llama-3.2-1B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, \n", "usage=CompletionUsage(completion_tokens=23, prompt_tokens=198, total_tokens=221, completion_tokens_details=None, prompt_tokens_details=None))\n", "\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "terminate_process(server_process)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to support a new model?\n", "\n", "For adding support of more different models:\n", " 1. Update the `TOOLS_TAG_LIST` in `sglang/srt/utils.py` with the tool tag used by the model.\n", " 2. Add support in `parse_tool_response` function for converting into tool calls `sglang/srt/utils.py`\n" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }