docs: some router-related doc fixes for 0.5.1 release (#3232)

Signed-off-by: PeaBrane <yanrpei@gmail.com>

docs: some router-related doc fixes for 0.5.1 release (#3232)
Signed-off-by: PeaBrane <yanrpei@gmail.com>
a69efbd1 · Yan Ru Pei · GitHub · cb3dc244 · a69efbd1 · a69efbd1
Unverified Commit a69efbd1 authored Sep 25, 2025 by Yan Ru Pei Committed by GitHub Sep 25, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

benchmarks/router/README.md benchmarks/router/README.md +2 -1

docs/architecture/kv_cache_routing.md docs/architecture/kv_cache_routing.md +1 -1

No files found.
--- a/benchmarks/router/README.md
+++ b/benchmarks/router/README.md
@@ -51,7 +51,7 @@ This will start both etcd and NATS with the required configurations in the backg

 ### Step 1: Launch vLLM Workers

-First, start the vLLM worker engines in a terminal.
+Make sure you have 8 GPUs for these examples, unless you are using mockers (see below). First, start the vLLM worker engines in a terminal.

 ```bash
 # Default: 8 vLLM workers with DeepSeek model (explicitly sets --block-size 64)
@@ -60,6 +60,7 @@ First, start the vLLM worker engines in a terminal.
    --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B

 # Example: 4 vLLM workers with larger model using tensor parallelism (2 GPUs per worker)
+# NOTE: this would likely require each GPU having 80GB of VRAM
 ./run_engines.sh \
    --num-workers 4 \
    --model-path openai/gpt-oss-120b \

--- a/docs/architecture/kv_cache_routing.md
+++ b/docs/architecture/kv_cache_routing.md
@@ -318,7 +318,7 @@ Instead of launching the KV Router via command line, you can create a `KvPushRou

 First, launch your backend engines:
 ```bash
-python -m dynamo.vllm --model meta-llama/Llama-2-7b-hf --endpoint dyn://inference.vllm.generate
+python -m dynamo.vllm --model meta-llama/Llama-2-7b-hf
 ```

 ### Example Script