multinode-examples.md 5.24 KB
Newer Older
1
2
3
4
5
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

6
7
8
9
# Multinode Examples

## Multi-node sized models

10
SGLang allows you to deploy multi-node sized models by adding in the `dist-init-addr`, `nnodes`, and `node-rank` arguments. Below we demonstrate and example of deploying DeepSeek R1 for disaggregated serving across 4 nodes. This example requires 4 nodes of 8xH100 GPUs.
11

12
13
14
15
16
17
18
19
20
**Prerequisite**: Building the Dynamo container.

```bash
cd $DYNAMO_ROOT
docker build -f container/Dockerfile.sglang-wideep . -t dynamo-wideep --no-cache
```

You can use a specific tag from the [lmsys dockerhub](https://hub.docker.com/r/lmsysorg/sglang/tags) by adding `--build-arg SGLANG_IMAGE_TAG=<tag>` to the build command.

21
**Step 1**: Use the provided helper script to generate commands to start NATS/ETCD on your head prefill node. This script will also give you environment variables to export on each other node. You will need the IP addresses of your head prefill and head decode node to run this script.
22
```bash
23
./utils/gen_env_vars.sh
24
25
26
27
28
29
```

**Step 2**: Ensure that your configuration file has the required arguments. Here's an example configuration that runs prefill and the model in TP16:

Node 1: Run HTTP ingress, processor, and 8 shards of the prefill worker
```bash
30
# run ingress
31
python3 -m dynamo.frontend --http-port=8000 &
32
# run prefill worker
33
python3 -m dynamo.sglang \
34
35
36
37
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
38
  --dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
39
40
41
42
43
44
45
  --nnodes 2 \
  --node-rank 0 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode prefill \
  --disaggregation-transfer-backend nixl \
46
47
  --disaggregation-bootstrap-port 30001 \
  --mem-fraction-static 0.82
48
49
```

50
Node 2: Run the remaining 8 shards of the prefill worker
51
```bash
52
python3 -m dynamo.sglang \
53
54
55
56
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
57
  --dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
58
59
60
61
62
63
64
  --nnodes 2 \
  --node-rank 1 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode prefill \
  --disaggregation-transfer-backend nixl \
65
  --disaggregation-bootstrap-port 30001 \
66
  --mem-fraction-static 0.82
67
68
69
70
```

Node 3: Run the first 8 shards of the decode worker
```bash
71
python3 -m dynamo.sglang \
72
73
74
75
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
76
  --dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
77
78
79
80
81
82
83
  --nnodes 2 \
  --node-rank 0 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode decode \
  --disaggregation-transfer-backend nixl \
84
  --disaggregation-bootstrap-port 30001 \
85
  --mem-fraction-static 0.82
86
87
88
89
```

Node 4: Run the remaining 8 shards of the decode worker
```bash
90
python3 -m dynamo.sglang \
91
92
93
94
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
95
  --dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
96
97
98
99
100
101
102
  --nnodes 2 \
  --node-rank 1 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode decode \
  --disaggregation-transfer-backend nixl \
103
  --disaggregation-bootstrap-port 30001 \
104
  --mem-fraction-static 0.82
105
106
107
108
109
110
```

**Step 3**: Run inference
SGLang typically requires a warmup period to ensure the DeepGEMM kernels are loaded. We recommend running a few warmup requests and ensuring that the DeepGEMM kernels load in.

```bash
111
curl ${HEAD_PREFILL_NODE_IP}:8000/v1/chat/completions \
112
113
  -H "Content-Type: application/json" \
  -d '{
114
    "model": "deepseek-ai/DeepSeek-R1",
115
116
117
118
119
120
121
122
123
124
125
    "messages": [
    {
        "role": "user",
        "content": "In the heart of the tennis world, where champions rise and fall with each Grand Slam, lies the legend of the Golden Racket of Wimbledon. Once wielded by the greatest players of antiquity, this mythical racket is said to bestow unparalleled precision, grace, and longevity upon its rightful owner. For centuries, it remained hidden, its location lost to all but the most dedicated scholars of the sport. You are Roger Federer, the Swiss maestro whose elegant play and sportsmanship have already cemented your place among the legends, but whose quest for perfection remains unquenched even as time marches on. Recent dreams have brought you visions of this ancient artifact, along with fragments of a map that seems to lead to its resting place. Your journey will take you through the hallowed grounds of tennis history, from the clay courts of Roland Garros to the hidden training grounds of forgotten champions, and finally to a secret chamber beneath Centre Court itself. Character Background: Develop a detailed background for Roger Federer in this quest. Describe his motivations for seeking the Golden Racket, his tennis skills and personal weaknesses, and any connections to the legends of the sport that came before him. Is he driven by a desire to extend his career, to secure his legacy as the greatest of all time, or perhaps by something more personal? What price might he be willing to pay to claim this artifact, and what challenges from rivals past and present might stand in his way?"
    }
    ],
    "stream":false,
    "max_tokens": 30
  }'
```