docs: fixes distributed executor backend config for multi-node vllm (#29173)

Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id> Co-authored-by: Michael Goin <mgoin64@gmail.com>

docs: fixes distributed executor backend config for multi-node vllm (#29173)
Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id> Co-authored-by: Michael Goin <mgoin64@gmail.com>
3ed767ec · Michael Act · GitHub · 5f96c00c · 3ed767ec
Unverified Commit 3ed767ec authored Nov 23, 2025 by Michael Act Committed by GitHub Nov 23, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

docs/serving/parallelism_scaling.md docs/serving/parallelism_scaling.md +4 -2

No files found.
--- a/docs/serving/parallelism_scaling.md
+++ b/docs/serving/parallelism_scaling.md
@@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in
 ```bash
 vllm serve /path/to/the/model/in/the/container \
    --tensor-parallel-size 8 \
-    --pipeline-parallel-size 2
+    --pipeline-parallel-size 2 \
+    --distributed-executor-backend ray
 ```

 Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:

 ```bash
 vllm serve /path/to/the/model/in/the/container \
-     --tensor-parallel-size 16
+     --tensor-parallel-size 16 \
+     --distributed-executor-backend ray
 ```

 ## Optimizing network communication for tensor parallelism