Unverified Commit d4ff6f0a authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Fix typos and remove incomplete section from vllm multinode doc (#3588)


Signed-off-by: default avatarRyan McCormick <rmccormick@nvidia.com>
parent 8dd104d4
...@@ -73,7 +73,7 @@ python -m dynamo.vllm \ ...@@ -73,7 +73,7 @@ python -m dynamo.vllm \
Deploy prefill and decode workers on separate nodes for optimized resource utilization: Deploy prefill and decode workers on separate nodes for optimized resource utilization:
**Node 1**: Run ingress and prefill workers **Node 1**: Run ingress and decode worker
```bash ```bash
# Start ingress # Start ingress
python -m dynamo.frontend --router-mode kv & python -m dynamo.frontend --router-mode kv &
...@@ -85,7 +85,7 @@ python -m dynamo.vllm \ ...@@ -85,7 +85,7 @@ python -m dynamo.vllm \
--enforce-eager --enforce-eager
``` ```
**Node 2**: Run decode workers **Node 2**: Run prefill worker
```bash ```bash
# Start decode worker # Start decode worker
python -m dynamo.vllm \ python -m dynamo.vllm \
...@@ -94,14 +94,3 @@ python -m dynamo.vllm \ ...@@ -94,14 +94,3 @@ python -m dynamo.vllm \
--enforce-eager \ --enforce-eager \
--is-prefill-worker --is-prefill-worker
``` ```
## Large Model Deployment
For models requiring more GPUs than available on a single node such as tensor-parallel-size 16:
**Node 1**: First part of tensor-parallel model
```bash
# Start ingress
python -m dynamo.frontend --router-mode kv &
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment