{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Embedding Model\n", "\n", "SGLang supports embedding models in the same way as completion models. Here are some example models:\n", "\n", "- [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct)\n", "- [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch A Server\n", "\n", "The following code is equivalent to running this in the shell:\n", "```bash\n", "python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n", " --port 30010 --host 0.0.0.0 --is-embedding --log-level error\n", "```\n", "\n", "Remember to add `--is-embedding` to the command." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Embedding server is ready. Proceeding with the next steps.\n" ] } ], "source": [ "from sglang.utils import execute_shell_command, wait_for_server, terminate_process\n", "\n", "embedding_process = execute_shell_command(\n", " \"\"\"\n", "python -m sglang.launch_server --model-path Alibaba-NLP/gte-Qwen2-7B-instruct \\\n", " --port 30010 --host 0.0.0.0 --is-embedding --log-level error\n", "\"\"\"\n", ")\n", "\n", "wait_for_server(\"http://localhost:30010\")\n", "\n", "print(\"Embedding server is ready. Proceeding with the next steps.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Curl" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text embedding (first 10): [0.0083160400390625, 0.0006804466247558594, -0.00809478759765625, -0.0006995201110839844, 0.0143890380859375, -0.0090179443359375, 0.01238250732421875, 0.00209808349609375, 0.0062103271484375, -0.003047943115234375]\n" ] } ], "source": [ "import subprocess, json\n", "\n", "text = \"Once upon a time\"\n", "\n", "curl_text = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n", " -H \"Content-Type: application/json\" \\\n", " -H \"Authorization: Bearer None\" \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": \"{text}\"}}'\"\"\"\n", "\n", "text_embedding = json.loads(subprocess.check_output(curl_text, shell=True))[\"data\"][0][\n", " \"embedding\"\n", "]\n", "\n", "print(f\"Text embedding (first 10): {text_embedding[:10]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using OpenAI Compatible API" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text embedding (first 10): [0.00829315185546875, 0.0007004737854003906, -0.00809478759765625, -0.0006799697875976562, 0.01438140869140625, -0.00897979736328125, 0.0123748779296875, 0.0020923614501953125, 0.006195068359375, -0.0030498504638671875]\n" ] } ], "source": [ "import openai\n", "\n", "client = openai.Client(base_url=\"http://127.0.0.1:30010/v1\", api_key=\"None\")\n", "\n", "# Text embedding example\n", "response = client.embeddings.create(\n", " model=\"Alibaba-NLP/gte-Qwen2-7B-instruct\",\n", " input=text,\n", ")\n", "\n", "embedding = response.data[0].embedding[:10]\n", "print(f\"Text embedding (first 10): {embedding}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Input IDs\n", "\n", "SGLang also supports `input_ids` as input to get the embedding." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Input IDs embedding (first 10): [0.00829315185546875, 0.0007004737854003906, -0.00809478759765625, -0.0006799697875976562, 0.01438140869140625, -0.00897979736328125, 0.0123748779296875, 0.0020923614501953125, 0.006195068359375, -0.0030498504638671875]\n" ] } ], "source": [ "import json\n", "import os\n", "from transformers import AutoTokenizer\n", "\n", "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", "\n", "tokenizer = AutoTokenizer.from_pretrained(\"Alibaba-NLP/gte-Qwen2-7B-instruct\")\n", "input_ids = tokenizer.encode(text)\n", "\n", "curl_ids = f\"\"\"curl -s http://localhost:30010/v1/embeddings \\\n", " -H \"Content-Type: application/json\" \\\n", " -H \"Authorization: Bearer None\" \\\n", " -d '{{\"model\": \"Alibaba-NLP/gte-Qwen2-7B-instruct\", \"input\": {json.dumps(input_ids)}}}'\"\"\"\n", "\n", "input_ids_embedding = json.loads(subprocess.check_output(curl_ids, shell=True))[\"data\"][\n", " 0\n", "][\"embedding\"]\n", "\n", "print(f\"Input IDs embedding (first 10): {input_ids_embedding[:10]}\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "terminate_process(embedding_process)" ] } ], "metadata": { "kernelspec": { "display_name": "AlphaMeemory", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }