Unverified Commit 3ed767ec authored by Michael Act's avatar Michael Act Committed by GitHub
Browse files

docs: fixes distributed executor backend config for multi-node vllm (#29173)


Signed-off-by: default avatarMichael Act <michael.a.c.tulenan@gdplabs.id>
Co-authored-by: default avatarMichael Goin <mgoin64@gmail.com>
parent 5f96c00c
...@@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in ...@@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in
```bash ```bash
vllm serve /path/to/the/model/in/the/container \ vllm serve /path/to/the/model/in/the/container \
--tensor-parallel-size 8 \ --tensor-parallel-size 8 \
--pipeline-parallel-size 2 --pipeline-parallel-size 2 \
--distributed-executor-backend ray
``` ```
Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster: Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:
```bash ```bash
vllm serve /path/to/the/model/in/the/container \ vllm serve /path/to/the/model/in/the/container \
--tensor-parallel-size 16 --tensor-parallel-size 16 \
--distributed-executor-backend ray
``` ```
## Optimizing network communication for tensor parallelism ## Optimizing network communication for tensor parallelism
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment