Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
e06bfd55
"git@developer.sourcefind.cn:OpenDAS/openpcdet.git" did not exist on "a1bd2d7bc3c093dc8a934f43b5f7a2993068d8e4"
Unverified
Commit
e06bfd55
authored
Apr 22, 2025
by
GuanLuo
Committed by
GitHub
Apr 22, 2025
Browse files
docs: R1 disaggregation guide (#720)
parent
cce0c0f0
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
97 additions
and
1 deletion
+97
-1
examples/llm/configs/mutinode_disagg_r1.yaml
examples/llm/configs/mutinode_disagg_r1.yaml
+46
-0
examples/llm/multinode-examples.md
examples/llm/multinode-examples.md
+51
-1
No files found.
examples/llm/configs/mutinode_disagg_r1.yaml
0 → 100644
View file @
e06bfd55
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Common
:
model
:
deepseek-ai/DeepSeek-R1
block-size
:
64
max-model-len
:
16384
kv-transfer-config
:
'
{"kv_connector":"DynamoNixlConnector"}'
tensor-parallel-size
:
16
Frontend
:
served_model_name
:
deepseek-ai/DeepSeek-R1
endpoint
:
dynamo.Processor.chat/completions
port
:
8000
Processor
:
router
:
round-robin
common-configs
:
[
model
,
block-size
]
VllmWorker
:
remote-prefill
:
true
conditional-disagg
:
false
ServiceArgs
:
workers
:
1
resources
:
gpu
:
16
common-configs
:
[
model
,
block-size
,
max-model-len
,
kv-transfer-config
,
tensor-parallel-size
]
PrefillWorker
:
max-num-batched-tokens
:
16384
ServiceArgs
:
workers
:
1
resources
:
gpu
:
16
common-configs
:
[
model
,
block-size
,
max-model-len
,
kv-transfer-config
,
tensor-parallel-size
]
examples/llm/multinode-examples.md
View file @
e06bfd55
...
...
@@ -163,4 +163,54 @@ curl <node1-ip>:8000/v1/chat/completions \
"stream": true,
"max_tokens": 300
}'
```
\ No newline at end of file
```
##### Disaggregated Deployment
In this example, we will be deploying two replicas of the model (one prefill worker
and one decode worker). We will be using 4 H100x8 nodes and group every two of them
into one Ray cluster in the same way as described in aggregated deployment.
However, for etcd and nats server, we will only run them in
one node and let's consider that node to be the head node of the whole deployment.
Note that if you are starting etcd server directly instead of using
`docker compose`
,
you should add additional arguments to be discoverable in other node.
```
bash
etcd
--advertise-client-urls
http://<head-node-ip>:2379
--listen-client-urls
http://<head-node-ip>:2379,http://127.0.0.1:2379
```
**Step 1**
: On every two nodes, set up Ray cluster as described in
[
aggregated deployment
](
#aggregated-deployment
)
. After that, you should have
two independent Ray cluster, each has access to 16 GPUs.
**Step 2**
start the deployment by running different flavors of
`dynamo serve`
on one of the node for each Ray cluster, using the configuration file,
`configs/mutinode_disagg_r1.yaml`
.
For decode, below command will be used and the node will be the entry point of
the whole deployment. In other words, the ip of the node should be used to send
requests to.
```
bash
# if not head node
export
NATS_SERVER
=
'nats://<nats-server-ip>:4222'
export
ETCD_ENDPOINTS
=
'<etcd-endpoints-ip>:2379'
cd
$DYNAMO_HOME
/examples/llm
dynamo serve graphs.agg:Frontend
-f
./configs/mutinode_disagg_r1.yaml
```
For prefill:
```
bash
# if not head node
export
NATS_SERVER
=
'nats://<nats-server-ip>:4222'
export
ETCD_ENDPOINTS
=
'<etcd-endpoints-ip>:2379'
cd
$DYNAMO_HOME
/examples/llm
dynamo serve components.prefill_worker:PrefillWorker
-f
./configs/mutinode_disagg_r1.yaml
```
### Client
In another terminal, you can send the same curl request as described in
[
aggregated deployment
](
#aggregated-deployment
)
, addressing to the ip of
the decode node.
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment