vision_language_model.ipynb 21.9 KB
Newer Older
Chayenne's avatar
Chayenne committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Vision Language Model\n",
    "\n",
    "SGLang supports vision language models in the same way as completion models. Here are some example models:\n",
    "\n",
    "- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)\n",
    "- [lmms-lab/llava-onevision-qwen2-7b-ov](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Launch A Server\n",
    "\n",
    "The following code is equivalent to running this in the shell:\n",
    "\n",
    "```bash\n",
    "python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \\\n",
    " --port=30010 --chat-template=llama_3_vision\n",
    "```\n",
    "\n",
    "Remember to add `--chat-template=llama_3_vision` to specify the vision chat template, otherwise the server only supports text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/chenyang/miniconda3/envs/AlphaMeemory/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
      "  warnings.warn(\n",
      "[2024-10-31 23:10:49] server_args=ServerArgs(model_path='meta-llama/Llama-3.2-11B-Vision-Instruct', tokenizer_path='meta-llama/Llama-3.2-11B-Vision-Instruct', tokenizer_mode='auto', skip_tokenizer_init=False, load_format='auto', trust_remote_code=False, dtype='auto', kv_cache_dtype='auto', quantization=None, context_length=None, device='cuda', served_model_name='meta-llama/Llama-3.2-11B-Vision-Instruct', chat_template='llama_3_vision', is_embedding=False, host='127.0.0.1', port=30010, mem_fraction_static=0.88, max_running_requests=None, max_total_tokens=None, chunked_prefill_size=8192, max_prefill_tokens=16384, schedule_policy='lpm', schedule_conservativeness=1.0, tp_size=1, stream_interval=1, random_seed=178735948, constrained_json_whitespace_pattern=None, decode_log_interval=40, log_level='info', log_level_http=None, log_requests=False, show_time_cost=False, api_key=None, file_storage_pth='SGLang_storage', enable_cache_report=False, watchdog_timeout=600, dp_size=1, load_balance_method='round_robin', dist_init_addr=None, nnodes=1, node_rank=0, json_model_override_args='{}', enable_double_sparsity=False, ds_channel_config_path=None, ds_heavy_channel_num=32, ds_heavy_token_num=256, ds_heavy_channel_type='qk', ds_sparse_decode_threshold=4096, lora_paths=None, max_loras_per_batch=8, attention_backend='flashinfer', sampling_backend='flashinfer', grammar_backend='outlines', disable_flashinfer=False, disable_flashinfer_sampling=False, disable_radix_cache=False, disable_regex_jump_forward=False, disable_cuda_graph=False, disable_cuda_graph_padding=False, disable_disk_cache=False, disable_custom_all_reduce=False, disable_mla=False, disable_penalizer=False, disable_nan_detection=False, enable_overlap_schedule=False, enable_mixed_chunk=False, enable_torch_compile=False, torch_compile_max_bs=32, cuda_graph_max_bs=160, torchao_config='', enable_p2p_check=False, triton_attention_reduce_in_fp32=False, num_continuous_decode_steps=1)\n",
      "/home/chenyang/miniconda3/envs/AlphaMeemory/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
      "  warnings.warn(\n",
      "/home/chenyang/miniconda3/envs/AlphaMeemory/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.\n",
      "  warnings.warn(\n",
      "[2024-10-31 23:10:51] Use chat template for the OpenAI-compatible API server: llama_3_vision\n",
      "[2024-10-31 23:10:56 TP0] Automatically turn off --chunked-prefill-size and adjust --mem-fraction-static for multimodal models.\n",
      "[2024-10-31 23:10:56 TP0] Init torch distributed begin.\n",
      "[2024-10-31 23:10:56 TP0] Load weight begin. avail mem=47.27 GB\n",
      "[2024-10-31 23:10:57 TP0] lm_eval is not installed, GPTQ may not be usable\n",
      "INFO 10-31 23:10:57 weight_utils.py:243] Using model weights format ['*.safetensors']\n",
      "Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s]\n",
      "Loading safetensors checkpoint shards:  20% Completed | 1/5 [00:00<00:01,  2.31it/s]\n",
      "Loading safetensors checkpoint shards:  40% Completed | 2/5 [00:00<00:01,  2.24it/s]\n",
      "Loading safetensors checkpoint shards:  60% Completed | 3/5 [00:01<00:00,  2.23it/s]\n",
      "Loading safetensors checkpoint shards:  80% Completed | 4/5 [00:01<00:00,  2.21it/s]\n",
      "Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:01<00:00,  2.85it/s]\n",
      "Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:01<00:00,  2.54it/s]\n",
      "\n",
      "[2024-10-31 23:10:59 TP0] Load weight end. type=MllamaForConditionalGeneration, dtype=torch.bfloat16, avail mem=27.15 GB\n",
      "[2024-10-31 23:10:59 TP0] Memory pool end. avail mem=6.62 GB\n",
      "[2024-10-31 23:10:59 TP0] Capture cuda graph begin. This can take up to several minutes.\n",
      "[2024-10-31 23:11:09 TP0] max_total_num_tokens=127149, max_prefill_tokens=16384, max_running_requests=2049, context_len=131072\n",
      "[2024-10-31 23:11:10] INFO:     Started server process [1796287]\n",
      "[2024-10-31 23:11:10] INFO:     Waiting for application startup.\n",
      "[2024-10-31 23:11:10] INFO:     Application startup complete.\n",
      "[2024-10-31 23:11:10] INFO:     Uvicorn running on http://127.0.0.1:30010 (Press CTRL+C to quit)\n",
      "[2024-10-31 23:11:10] INFO:     127.0.0.1:60770 - \"GET /v1/models HTTP/1.1\" 200 OK\n",
      "[2024-10-31 23:11:11] INFO:     127.0.0.1:60780 - \"GET /get_model_info HTTP/1.1\" 200 OK\n",
      "[2024-10-31 23:11:11 TP0] Prefill batch. #new-seq: 1, #new-token: 7, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0\n",
      "[2024-10-31 23:11:11] INFO:     127.0.0.1:60796 - \"POST /generate HTTP/1.1\" 200 OK\n",
      "[2024-10-31 23:11:11] The server is fired up and ready to roll!\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<strong style='color: #00008B;'><br><br>                    NOTE: Typically, the server runs in a separate terminal.<br>                    In this notebook, we run the server and notebook code together, so their outputs are combined.<br>                    To improve clarity, the server logs are displayed in the original black color, while the notebook outputs are highlighted in blue.<br>                    </strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sglang.utils import (\n",
    "    execute_shell_command,\n",
    "    wait_for_server,\n",
    "    terminate_process,\n",
    "    print_highlight,\n",
    ")\n",
    "\n",
    "embedding_process = execute_shell_command(\n",
    "    \"\"\"\n",
    "    python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \\\n",
    "        --port=30010 --chat-template=llama_3_vision\n",
    "\n",
    "\"\"\"\n",
    ")\n",
    "\n",
    "wait_for_server(\"http://localhost:30010\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Curl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100   559    0     0  100   559      0    253  0:00:02  0:00:02 --:--:--   253"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/chenyang/miniconda3/envs/AlphaMeemory/lib/python3.11/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n",
      "  return torch.load(io.BytesIO(b))\n",
      "[2024-10-31 23:11:18 TP0] Prefill batch. #new-seq: 1, #new-token: 6463, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100   559    0     0  100   559      0    174  0:00:03  0:00:03 --:--:--   174"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2024-10-31 23:11:20 TP0] Decode batch. #running-req: 1, #token: 6496, token usage: 0.05, gen throughput (token/s): 3.90, #queue-req: 0\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100   559    0     0  100   559      0    107  0:00:05  0:00:05 --:--:--   107"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2024-10-31 23:11:21 TP0] Decode batch. #running-req: 1, #token: 6536, token usage: 0.05, gen throughput (token/s): 33.67, #queue-req: 0\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100   559    0     0  100   559      0     90  0:00:06  0:00:06 --:--:--     0"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2024-10-31 23:11:22 TP0] Decode batch. #running-req: 1, #token: 6576, token usage: 0.05, gen throughput (token/s): 33.60, #queue-req: 0\n",
      "[2024-10-31 23:11:22] INFO:     127.0.0.1:54224 - \"POST /v1/chat/completions HTTP/1.1\" 200 OK\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100  1544  100   985  100   559    142     80  0:00:06  0:00:06 --:--:--   265\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<strong style='color: #00008B;'>{'id': 'f618453e8f3e4408b893a958f2868a44', 'object': 'chat.completion', 'created': 1730441482, 'model': 'meta-llama/Llama-3.2-11B-Vision-Instruct', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'The image depicts a serene and peaceful landscape featuring a wooden boardwalk that meanders through a lush grassy field, set against a backdrop of trees and a bright blue sky with wispy clouds. The boardwalk is made of weathered wooden planks and is surrounded by tall grass on either side, creating a sense of depth and texture. The surrounding trees add a touch of natural beauty to the scene, while the blue sky with wispy clouds provides a sense of calmness and serenity. The overall atmosphere of the image is one of tranquility and relaxation, inviting the viewer to step into the peaceful world depicted.'}, 'logprobs': None, 'finish_reason': 'stop', 'matched_stop': 128009}], 'usage': {'prompt_tokens': 6463, 'total_tokens': 6588, 'completion_tokens': 125, 'prompt_tokens_details': None}}</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import subprocess, json, os\n",
    "\n",
    "curl_command = \"\"\"\n",
    "curl http://localhost:30010/v1/chat/completions \\\n",
    "  -H \"Content-Type: application/json\" \\\n",
    "  -H \"Authorization: Bearer None\" \\\n",
    "  -d '{\n",
    "    \"model\": \"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n",
    "    \"messages\": [\n",
    "      {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "          {\n",
    "            \"type\": \"text\",\n",
    "            \"text\": \"What’s in this image?\"\n",
    "          },\n",
    "          {\n",
    "            \"type\": \"image_url\",\n",
    "            \"image_url\": {\n",
    "              \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n",
    "            }\n",
    "          }\n",
    "        ]\n",
    "      }\n",
    "    ],\n",
    "    \"max_tokens\": 300\n",
    "  }'\n",
    "\"\"\"\n",
    "\n",
    "response = json.loads(subprocess.check_output(curl_command, shell=True))\n",
    "print_highlight(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using OpenAI Compatible API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2024-10-31 23:11:23 TP0] Prefill batch. #new-seq: 1, #new-token: 6463, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0\n",
      "[2024-10-31 23:11:24 TP0] Decode batch. #running-req: 1, #token: 6492, token usage: 0.05, gen throughput (token/s): 20.07, #queue-req: 0\n",
      "[2024-10-31 23:11:25 TP0] Decode batch. #running-req: 1, #token: 6532, token usage: 0.05, gen throughput (token/s): 33.68, #queue-req: 0\n",
      "[2024-10-31 23:11:26 TP0] Decode batch. #running-req: 1, #token: 6572, token usage: 0.05, gen throughput (token/s): 33.62, #queue-req: 0\n",
      "[2024-10-31 23:11:27 TP0] Decode batch. #running-req: 1, #token: 6612, token usage: 0.05, gen throughput (token/s): 33.62, #queue-req: 0\n",
      "[2024-10-31 23:11:28] INFO:     127.0.0.1:54228 - \"POST /v1/chat/completions HTTP/1.1\" 200 OK\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<strong style='color: #00008B;'>The image depicts a serene and peaceful scene of a wooden boardwalk leading through a lush field of tall grass, set against a backdrop of trees and a blue sky with clouds. The boardwalk is made of light-colored wood and has a simple design, with the wooden planks running parallel to each other. It stretches out into the distance, disappearing into the horizon.<br><br>The field is filled with tall, vibrant green grass that sways gently in the breeze, creating a sense of movement and life. The trees in the background are also lush and green, adding depth and texture to the scene. The blue sky above is dotted with white clouds, which are scattered across the horizon. The overall atmosphere of the image is one of tranquility and serenity, inviting the viewer to step into the peaceful world depicted.</strong>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import base64, requests\n",
    "from openai import OpenAI\n",
    "\n",
    "client = OpenAI(base_url=\"http://localhost:30010/v1\", api_key=\"None\")\n",
    "\n",
    "\n",
    "def encode_image(image_path):\n",
    "    with open(image_path, \"rb\") as image_file:\n",
    "        return base64.b64encode(image_file.read()).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "def download_image(image_url, image_path):\n",
    "    response = requests.get(image_url)\n",
    "    response.raise_for_status()\n",
    "    with open(image_path, \"wb\") as f:\n",
    "        f.write(response.content)\n",
    "\n",
    "\n",
    "image_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\n",
    "image_path = \"boardwalk.jpeg\"\n",
    "download_image(image_url, image_path)\n",
    "\n",
    "base64_image = encode_image(image_path)\n",
    "\n",
    "response = client.chat.completions.create(\n",
    "    model=\"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"What is in this image?\",\n",
    "                },\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\"url\": f\"data:image/jpeg;base64,{base64_image}\"},\n",
    "                },\n",
    "            ],\n",
    "        }\n",
    "    ],\n",
    "    max_tokens=300,\n",
    ")\n",
    "\n",
    "print_highlight(response.choices[0].message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multiple Images Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2024-10-31 23:11:28 TP0] Prefill batch. #new-seq: 1, #new-token: 12871, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0\n",
      "[2024-10-31 23:11:30 TP0] Decode batch. #running-req: 1, #token: 12899, token usage: 0.10, gen throughput (token/s): 15.36, #queue-req: 0\n",
      "[2024-10-31 23:11:31 TP0] Decode batch. #running-req: 1, #token: 12939, token usage: 0.10, gen throughput (token/s): 33.33, #queue-req: 0\n",
      "[2024-10-31 23:11:32 TP0] Decode batch. #running-req: 1, #token: 12979, token usage: 0.10, gen throughput (token/s): 33.28, #queue-req: 0\n",
      "[2024-10-31 23:11:33] INFO:     127.0.0.1:50966 - \"POST /v1/chat/completions HTTP/1.1\" 200 OK\n",
      "Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The two images depict a serene and idyllic scene, with the first image showing a well-trodden wooden path through a field, while the second image shows an overgrown, less-traveled path through the same field. The first image features a clear and well-maintained wooden path, whereas the second image shows a more neglected and overgrown path that is not as well-defined. The first image has a more vibrant and inviting atmosphere, while the second image appears more peaceful and serene. Overall, both images evoke a sense of tranquility and connection to nature.', refusal=None, role='assistant', function_call=None, tool_calls=None), matched_stop=128009)\n"
     ]
    }
   ],
   "source": [
    "from openai import OpenAI\n",
    "\n",
    "client = OpenAI(base_url=\"http://localhost:30010/v1\", api_key=\"None\")\n",
    "\n",
    "response = client.chat.completions.create(\n",
    "    model=\"meta-llama/Llama-3.2-11B-Vision-Instruct\",\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"Are there any differences between these two images?\",\n",
    "                },\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\",\n",
    "                    },\n",
    "                },\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\",\n",
    "                    },\n",
    "                },\n",
    "            ],\n",
    "        }\n",
    "    ],\n",
    "    max_tokens=300,\n",
    ")\n",
    "print(response.choices[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "terminate_process(embedding_process)\n",
    "os.remove(image_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chat Template\n",
    "\n",
    "As mentioned before, if you do not specify a vision model's `chat-template`, the server uses Hugging Face's default template, which only supports text.\n",
    "\n",
    "You can add your custom chat template by referring to the [custom chat template](../references/custom_chat_template.md).\n",
    "\n",
    "We list popular vision models with their chat templates:\n",
    "\n",
    "- [meta-llama/Llama-3.2-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) uses `llama_3_vision`.\n",
    "- [LLaVA-NeXT](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff) uses `chatml-llava`.\n",
    "- [llama3-llava-next](https://huggingface.co/lmms-lab/llama3-llava-next-8b) uses `llava_llama_3`.\n",
    "- [llava-onevision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) uses `chatml-llava`.\n",
    "- [liuhaotian/llava-v1.5 / 1.6](https://huggingface.co/liuhaotian/llava-v1.5-13b) uses `vicuna_v1.1`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "AlphaMeemory",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}