{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# OpenAI APIs - Embedding\n", "\n", "SGLang provides OpenAI-compatible APIs to enable a smooth transition from OpenAI services to self-hosted local models.\n", "A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/guides/embeddings).\n", "\n", "This tutorial covers the embedding APIs for embedding models, such as \n", "- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) \n", "- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch A Server\n", "\n", "The following code is equivalent to running this in the shell:\n", "\n", "```bash\n", "python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n", " --port 30010 --host 0.0.0.0 --is-embedding\n", "```\n", "\n", "Remember to add `--is-embedding` to the command." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:47:32.337369Z", "iopub.status.busy": "2024-11-01T02:47:32.337032Z", "iopub.status.idle": "2024-11-01T02:47:59.540926Z", "shell.execute_reply": "2024-11-01T02:47:59.539861Z" } }, "outputs": [], "source": [ "from sglang.utils import (\n", " execute_shell_command,\n", " wait_for_server,\n", " terminate_process,\n", " print_highlight,\n", ")\n", "\n", "embedding_process = execute_shell_command(\n", " \"\"\"\n", "python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n", " --port 30010 --host 0.0.0.0 --is-embedding\n", "\"\"\"\n", ")\n", "\n", "wait_for_server(\"http://localhost:30010\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using cURL" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:47:59.543958Z", "iopub.status.busy": "2024-11-01T02:47:59.543670Z", "iopub.status.idle": "2024-11-01T02:47:59.591699Z", "shell.execute_reply": "2024-11-01T02:47:59.590809Z" } }, "outputs": [], "source": [ "import subprocess, json\n", "\n", "text = \"Once upon a time\"\n", "\n", "curl_text = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": \"{text}\"}}'\"\"\"\n", "\n", "text_embedding = json.loads(subprocess.check_output(curl_text, shell=True))[\"data\"][0][\n", " \"embedding\"\n", "]\n", "\n", "print_highlight(f\"Text embedding (first 10): {text_embedding[:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Python Requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "text = \"Once upon a time\"\n", "\n", "response = requests.post(\n", " \"http://localhost:30010/v1/embeddings\",\n", " json={\n", " \"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\",\n", " \"input\": text\n", " }\n", ")\n", "\n", "text_embedding = response.json()[\"data\"][0][\"embedding\"]\n", "\n", "print_highlight(f\"Text embedding (first 10): {text_embedding[:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using OpenAI Python Client" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:47:59.594229Z", "iopub.status.busy": "2024-11-01T02:47:59.594049Z", "iopub.status.idle": "2024-11-01T02:48:00.006233Z", "shell.execute_reply": "2024-11-01T02:48:00.005255Z" } }, "outputs": [], "source": [ "import openai\n", "\n", "client = openai.Client(base_url=\"http://127.0.0.1:30010/v1\", api_key=\"None\")\n", "\n", "# Text embedding example\n", "response = client.embeddings.create(\n", " model=\"Alibaba-NLP/gte-Qwen2-7B-instruct\",\n", " input=text,\n", ")\n", "\n", "embedding = response.data[0].embedding[:10]\n", "print_highlight(f\"Text embedding (first 10): {embedding}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Input IDs\n", "\n", "SGLang also supports `input_ids` as input to get the embedding." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:48:00.008858Z", "iopub.status.busy": "2024-11-01T02:48:00.008689Z", "iopub.status.idle": "2024-11-01T02:48:01.872542Z", "shell.execute_reply": "2024-11-01T02:48:01.871573Z" } }, "outputs": [], "source": [ "import json\n", "import os\n", "from transformers import AutoTokenizer\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"Alibaba-NLP/gte-Qwen2-7B-instruct\")\n", "input_ids = tokenizer.encode(text)\n", "\n", "curl_ids = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": {json.dumps(input_ids)}}}'\"\"\"\n", "\n", "input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))[\"data\"][\n", " 0\n", "][\"embedding\"]\n", "\n", "print_highlight(f\"Input IDs embedding (first 10): {input_ids_embedding[:10]}\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-11-01T02:48:01.875204Z", "iopub.status.busy": "2024-11-01T02:48:01.874915Z", "iopub.status.idle": "2024-11-01T02:48:02.193734Z", "shell.execute_reply": "2024-11-01T02:48:02.192158Z" } }, "outputs": [], "source": [ "terminate_process(embedding_process)" ] } ], "metadata": { "kernelspec": { "display_name": "AlphaMeemory", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }