"Apart from the OpenAI compatible API, the SGLang Runtime also provides its native server API. We introduce these following API:\n",
"Apart from the OpenAI compatible APIs, the SGLang Runtime also provides its native server APIs. We introduce these following APIs:\n",
"\n",
"- `/generate`\n",
"- `/update_weights`\n",
"- `/get_server_args`\n",
"- `/get_model_info`\n",
"- `/health`\n",
"- `/health_generate`\n",
"- `/flush_cache`\n",
"- `/get_memory_pool_size`\n",
"- `/update_weights`\n",
"- `/encode`\n",
"\n",
"We mainly use `requests` to test these APIs in the following examples. You can also use `curl`."
]
...
...
@@ -68,7 +69,7 @@
"import requests\n",
"\n",
"url = \"http://localhost:30010/generate\"\n",
"data = {\"text\": \"List 3 countries and their capitals.\"}\n",
"data = {\"text\": \"What is the capital of France?\"}\n",
"\n",
"response = requests.post(url, json=data)\n",
"print_highlight(response.text)"
...
...
@@ -78,7 +79,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get Server Args\n",
"## Get Server Args\n",
"\n",
"Used to get the serving args when the server is launched."
]
...
...
@@ -252,13 +253,57 @@
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encode\n",
"\n",
"Used to encode text into embeddings. Note that this API is only available for [embedding models](./openai_embedding_api.ipynb) and will raise an error for generation models.\n",
"Therefore, we launch a new server to server an embedding model.\n"