Unverified Commit a69efbd1 authored by Yan Ru Pei's avatar Yan Ru Pei Committed by GitHub
Browse files

docs: some router-related doc fixes for 0.5.1 release (#3232)


Signed-off-by: default avatarPeaBrane <yanrpei@gmail.com>
parent cb3dc244
......@@ -51,7 +51,7 @@ This will start both etcd and NATS with the required configurations in the backg
### Step 1: Launch vLLM Workers
First, start the vLLM worker engines in a terminal.
Make sure you have 8 GPUs for these examples, unless you are using mockers (see below). First, start the vLLM worker engines in a terminal.
```bash
# Default: 8 vLLM workers with DeepSeek model (explicitly sets --block-size 64)
......@@ -60,6 +60,7 @@ First, start the vLLM worker engines in a terminal.
--model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
# Example: 4 vLLM workers with larger model using tensor parallelism (2 GPUs per worker)
# NOTE: this would likely require each GPU having 80GB of VRAM
./run_engines.sh \
--num-workers 4 \
--model-path openai/gpt-oss-120b \
......
......@@ -318,7 +318,7 @@ Instead of launching the KV Router via command line, you can create a `KvPushRou
First, launch your backend engines:
```bash
python -m dynamo.vllm --model meta-llama/Llama-2-7b-hf --endpoint dyn://inference.vllm.generate
python -m dynamo.vllm --model meta-llama/Llama-2-7b-hf
```
### Example Script
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment