Unverified Commit 44a2c4bd authored by simveit's avatar simveit Committed by GitHub
Browse files

Docs: improve link to docs (#3860)


Co-authored-by: default avatarChayenne <zhaochen20@outlook.com>
parent c9fc4a9d
...@@ -26,7 +26,7 @@ ...@@ -26,7 +26,7 @@
"\n", "\n",
"Launch the server in your terminal and wait for it to initialize.\n", "Launch the server in your terminal and wait for it to initialize.\n",
"\n", "\n",
"**Remember to add `--chat-template llama_3_vision` to specify the vision chat template, otherwise the server only supports text, and performance degradation may occur.**\n", "**Remember to add** `--chat-template llama_3_vision` **to specify the vision chat template, otherwise the server only supports text, and performance degradation may occur.**\n",
"\n", "\n",
"We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text." "We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text."
] ]
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
"\n", "\n",
"from sglang.utils import wait_for_server, print_highlight, terminate_process\n", "from sglang.utils import wait_for_server, print_highlight, terminate_process\n",
"\n", "\n",
"embedding_process, port = launch_server_cmd(\n", "vision_process, port = launch_server_cmd(\n",
" \"\"\"\n", " \"\"\"\n",
"python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \\\n", "python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct \\\n",
" --chat-template=llama_3_vision\n", " --chat-template=llama_3_vision\n",
...@@ -245,7 +245,7 @@ ...@@ -245,7 +245,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"terminate_process(embedding_process)" "terminate_process(vision_process)"
] ]
}, },
{ {
......
...@@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma ...@@ -52,7 +52,7 @@ Please consult the documentation below to learn more about the parameters you ma
* `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template). * `chat_template`: The chat template to use. Deviating from the default might lead to unexpected responses. For multi-modal chat templates, refer to [here](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template).
* `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks. * `is_embedding`: Set to true to perform [embedding](https://docs.sglang.ai/backend/openai_api_embeddings.html) / [encode](https://docs.sglang.ai/backend/native_api.html#Encode-(embedding-model)) and [reward](https://docs.sglang.ai/backend/native_api.html#Classify-(reward-model)) tasks.
* `revision`: Adjust if a specific version of the model should be used. * `revision`: Adjust if a specific version of the model should be used.
* `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. * `skip_tokenizer_init`: Set to true to provide the tokens to the engine and get the output tokens directly, typically used in RLHF. Please see this [example for reference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/input_ids.py).
* `json_model_override_args`: Override model config with the provided JSON. * `json_model_override_args`: Override model config with the provided JSON.
* `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model. * `delete_ckpt_after_loading`: Delete the model checkpoint after loading the model.
......
...@@ -27,11 +27,13 @@ The router supports two working modes: ...@@ -27,11 +27,13 @@ The router supports two working modes:
This will be a drop-in replacement for the existing `--dp-size` argument of SGLang Runtime. Under the hood, it uses multi-processes to launch multiple workers, wait for them to be ready, then connect the router to all workers. This will be a drop-in replacement for the existing `--dp-size` argument of SGLang Runtime. Under the hood, it uses multi-processes to launch multiple workers, wait for them to be ready, then connect the router to all workers.
```bash ```bash
$ python -m sglang_router.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --dp-size 1 python -m sglang_router.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --dp-size 4
``` ```
After the server is ready, you can directly send requests to the router as the same way as sending requests to each single worker. After the server is ready, you can directly send requests to the router as the same way as sending requests to each single worker.
Please adjust the batchsize accordingly to archieve maximum throughput.
```python ```python
import requests import requests
...@@ -47,7 +49,7 @@ print(response.json()) ...@@ -47,7 +49,7 @@ print(response.json())
This is useful for multi-node DP. First, launch workers on multiple nodes, then launch a router on the main node, and connect the router to all workers. This is useful for multi-node DP. First, launch workers on multiple nodes, then launch a router on the main node, and connect the router to all workers.
```bash ```bash
$ python -m sglang_router.launch_router --worker-urls http://worker_url_1 http://worker_url_2 python -m sglang_router.launch_router --worker-urls http://worker_url_1 http://worker_url_2
``` ```
## Dynamic Scaling APIs ## Dynamic Scaling APIs
...@@ -59,15 +61,17 @@ We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove wo ...@@ -59,15 +61,17 @@ We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove wo
Usage: Usage:
```bash ```bash
$ curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1 curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
``` ```
Example: Example:
```bash ```bash
$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001
$ curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
Successfully added worker: http://127.0.0.1:30001 curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
# Successfully added worker: http://127.0.0.1:30001
``` ```
- `/remove_worker` - `/remove_worker`
...@@ -75,14 +79,15 @@ Successfully added worker: http://127.0.0.1:30001 ...@@ -75,14 +79,15 @@ Successfully added worker: http://127.0.0.1:30001
Usage: Usage:
```bash ```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1 curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
``` ```
Example: Example:
```bash ```bash
$ curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001 curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001
Successfully removed worker: http://127.0.0.1:30001
# Successfully removed worker: http://127.0.0.1:30001
``` ```
Note: Note:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment