docs: add host and bootstrap port to disagg multinode example (#5309)

Signed-off-by: Dan Gil <dagil@nvidia.com>

docs: add host and bootstrap port to disagg multinode example (#5309)
Signed-off-by: Dan Gil <dagil@nvidia.com>
c29f78c1 · dagil-nvidia · GitHub · 5f8d90a3 · c29f78c1
Unverified Commit c29f78c1 authored Jan 09, 2026 by dagil-nvidia Committed by GitHub Jan 09, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 14 additions and 0 deletions

examples/basics/multinode/README.md examples/basics/multinode/README.md +14 -0

No files found.
--- a/examples/basics/multinode/README.md
+++ b/examples/basics/multinode/README.md
@@ -97,6 +97,7 @@ Ensure the following ports are accessible between nodes:
 - **2379**: etcd client port
 - **4222**: NATS client port
 - **8000**: Frontend HTTP port (only needed on frontend node)
+- **${DISAGG_BOOTSTRAP_PORT}**: SGLang disaggregation bootstrap port (set in Step 1; must be reachable across nodes)
 - **High-speed interconnect**: For optimal NIXL performance (InfiniBand, RoCE, or high-bandwidth Ethernet)
 ### 4. Hardware Setup
@@ -128,6 +129,9 @@ export INFRA_NODE_IP=<INFRA_NODE_IP>
 export ETCD_ENDPOINTS=http://${INFRA_NODE_IP}:2379
 export NATS_SERVER=nats://${INFRA_NODE_IP}:4222
 export DYN_LOG=debug  # Enable debug logging to see routing decisions
+# Use a fixed, reachable port for the disaggregation bootstrap server
+# Pick any free port and ensure it's open between nodes
+export DISAGG_BOOTSTRAP_PORT=32963
 ```
 ### Step 2: Launch Replica 1 (Node 1)
@@ -141,8 +145,10 @@ CUDA_VISIBLE_DEVICES=0 python3 -m dynamo.sglang \
    --served-model-name Qwen/Qwen3-0.6B \
    --page-size 16 \
    --tp 1 \
+    --host 0.0.0.0 \
    --trust-remote-code \
    --skip-tokenizer-init \
+    --disaggregation-bootstrap-port ${DISAGG_BOOTSTRAP_PORT} \
    --disaggregation-mode prefill \
    --disaggregation-transfer-backend nixl &
@@ -151,8 +157,10 @@ CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang \
    --served-model-name Qwen/Qwen3-0.6B \
    --page-size 16 \
    --tp 1 \
+    --host 0.0.0.0 \
    --trust-remote-code \
    --skip-tokenizer-init \
+    --disaggregation-bootstrap-port ${DISAGG_BOOTSTRAP_PORT} \
    --disaggregation-mode decode \
    --disaggregation-transfer-backend nixl
 ```
@@ -161,6 +169,8 @@ CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang \
 >
 > - `CUDA_VISIBLE_DEVICES`: Controls which GPU each worker uses (0 and 1 for different > GPUs)
 > - `--page-size 16`: Sets the KV cache block size - must be identical across all workers
+> - `--host 0.0.0.0`: Exposes the SGLang bootstrap server on all interfaces so other nodes can reach it
+> - `--disaggregation-bootstrap-port`: Uses the fixed port you set in `DISAGG_BOOTSTRAP_PORT`; ensure this port is open between nodes
 > - `--disaggregation-mode`: Separates prefill (prompt processing) from decode (token > generation)
 > - `--disaggregation-transfer-backend nixl`: Enables high-speed GPU-to-GPU transfers
 > - `--skip-tokenizer-init`: Avoids duplicate tokenizer loading since the frontend > handles tokenization
@@ -176,8 +186,10 @@ CUDA_VISIBLE_DEVICES=0 python3 -m dynamo.sglang \
    --served-model-name Qwen/Qwen3-0.6B \
    --page-size 16 \
    --tp 1 \
+    --host 0.0.0.0 \
    --trust-remote-code \
    --skip-tokenizer-init \
+    --disaggregation-bootstrap-port ${DISAGG_BOOTSTRAP_PORT} \
    --disaggregation-mode prefill \
    --disaggregation-transfer-backend nixl &
@@ -187,8 +199,10 @@ CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang \
    --served-model-name Qwen/Qwen3-0.6B \
    --page-size 16 \
    --tp 1 \
+    --host 0.0.0.0 \
    --trust-remote-code \
    --skip-tokenizer-init \
+    --disaggregation-bootstrap-port ${DISAGG_BOOTSTRAP_PORT} \
    --disaggregation-mode decode \
    --disaggregation-transfer-backend nixl
 ```