"lib/bindings/vscode:/vscode.git/clone" did not exist on "cb5a657a6af39cf2cb3595bff439fe46601b2136"
multinode-examples.md 5.14 KB
Newer Older
1
2
3
4
5
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

6
7
8
9
# Multinode Examples

## Multi-node sized models

10
SGLang allows you to deploy multi-node sized models by adding in the `dist-init-addr`, `nnodes`, and `node-rank` arguments. Below we demonstrate and example of deploying DeepSeek R1 for disaggregated serving across 4 nodes. This example requires 4 nodes of 8xH100 GPUs.
11

12
13
14
15
16
17
18
19
20
**Prerequisite**: Building the Dynamo container.

```bash
cd $DYNAMO_ROOT
docker build -f container/Dockerfile.sglang-wideep . -t dynamo-wideep --no-cache
```

You can use a specific tag from the [lmsys dockerhub](https://hub.docker.com/r/lmsysorg/sglang/tags) by adding `--build-arg SGLANG_IMAGE_TAG=<tag>` to the build command.

21
**Step 1**: Ensure that your configuration file has the required arguments. Here's an example configuration that runs prefill and the model in TP16:
22
23
24

Node 1: Run HTTP ingress, processor, and 8 shards of the prefill worker
```bash
25
# run ingress
26
python3 -m dynamo.frontend --http-port=8000 &
27
# run prefill worker
28
python3 -m dynamo.sglang \
29
30
31
32
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
33
  --dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
34
35
36
37
38
39
40
  --nnodes 2 \
  --node-rank 0 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode prefill \
  --disaggregation-transfer-backend nixl \
41
  --disaggregation-bootstrap-port 30001 \
42
  --load-balance-method round_robin \
43
  --host 0.0.0.0 \
44
  --mem-fraction-static 0.82
45
46
```

47
Node 2: Run the remaining 8 shards of the prefill worker
48
```bash
49
python3 -m dynamo.sglang \
50
51
52
53
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
54
  --dist-init-addr ${HEAD_PREFILL_NODE_IP}:29500 \
55
56
57
58
59
60
61
  --nnodes 2 \
  --node-rank 1 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode prefill \
  --disaggregation-transfer-backend nixl \
62
  --disaggregation-bootstrap-port 30001 \
63
  --host 0.0.0.0 \
64
  --load-balance-method round_robin \
65
  --mem-fraction-static 0.82
66
67
68
69
```

Node 3: Run the first 8 shards of the decode worker
```bash
70
python3 -m dynamo.sglang \
71
72
73
74
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
75
  --dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
76
77
78
79
80
81
82
  --nnodes 2 \
  --node-rank 0 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode decode \
  --disaggregation-transfer-backend nixl \
83
  --disaggregation-bootstrap-port 30001 \
84
  --host 0.0.0.0 \
85
  --prefill-round-robin-balance \
86
  --mem-fraction-static 0.82
87
88
89
90
```

Node 4: Run the remaining 8 shards of the decode worker
```bash
91
python3 -m dynamo.sglang \
92
93
94
95
  --model-path /model/ \
  --served-model-name deepseek-ai/DeepSeek-R1 \
  --tp 16 \
  --dp-size 16 \
96
  --dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
97
98
99
100
101
102
103
  --nnodes 2 \
  --node-rank 1 \
  --enable-dp-attention \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode decode \
  --disaggregation-transfer-backend nixl \
104
  --disaggregation-bootstrap-port 30001 \
105
  --host 0.0.0.0 \
106
  --prefill-round-robin-balance \
107
  --mem-fraction-static 0.82
108
109
```

110
**Step 2**: Run inference
111
112
113
SGLang typically requires a warmup period to ensure the DeepGEMM kernels are loaded. We recommend running a few warmup requests and ensuring that the DeepGEMM kernels load in.

```bash
114
curl ${HEAD_PREFILL_NODE_IP}:8000/v1/chat/completions \
115
116
  -H "Content-Type: application/json" \
  -d '{
117
    "model": "deepseek-ai/DeepSeek-R1",
118
119
120
121
122
123
124
125
126
127
128
    "messages": [
    {
        "role": "user",
        "content": "In the heart of the tennis world, where champions rise and fall with each Grand Slam, lies the legend of the Golden Racket of Wimbledon. Once wielded by the greatest players of antiquity, this mythical racket is said to bestow unparalleled precision, grace, and longevity upon its rightful owner. For centuries, it remained hidden, its location lost to all but the most dedicated scholars of the sport. You are Roger Federer, the Swiss maestro whose elegant play and sportsmanship have already cemented your place among the legends, but whose quest for perfection remains unquenched even as time marches on. Recent dreams have brought you visions of this ancient artifact, along with fragments of a map that seems to lead to its resting place. Your journey will take you through the hallowed grounds of tennis history, from the clay courts of Roland Garros to the hidden training grounds of forgotten champions, and finally to a secret chamber beneath Centre Court itself. Character Background: Develop a detailed background for Roger Federer in this quest. Describe his motivations for seeking the Golden Racket, his tennis skills and personal weaknesses, and any connections to the legends of the sport that came before him. Is he driven by a desire to extend his career, to secure his legacy as the greatest of all time, or perhaps by something more personal? What price might he be willing to pay to claim this artifact, and what challenges from rivals past and present might stand in his way?"
    }
    ],
    "stream":false,
    "max_tokens": 30
  }'
```