"lib/runtime/tests/vscode:/vscode.git/clone" did not exist on "cf433e6825d83f41905da47d69ca5ee30d4eb1ba"
Unverified Commit 64445c0a authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

feat(sglang): deepep sglang support in dynamo (#1120)


Co-authored-by: default avatarkkranen <kyle.kranen@gmail.com>
parent e2716073
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Note this container is built from a local dockerfile
# Please see instructions in examples/sglang/README.md
FROM deepep:latest
# Add NIXL build dependencies
RUN apt-get update -y && \
apt-get install -y \
cmake \
meson \
ninja-build \
pybind11-dev \
patchelf
# Install Python build dependencies
RUN pip install --break-system-packages meson-python wheel build
# Add architecture args for NIXL build
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
WORKDIR /sgl-workspace
# Pinning to NIXL 0.2.1 right now
# TODO: investigate pip install failure with 0.3.0 release
ARG NIXL_COMMIT="5e4c179ee850d482a83cb2a211e0947e46281060"
RUN git clone https://github.com/ai-dynamo/nixl.git && cd nixl && git checkout ${NIXL_COMMIT} &&pip install --break-system-packages . --config-settings=setup-args="-Ducx_path=/opt/hpcx/ucx"
WORKDIR /sgl-workspace
RUN pip uninstall --break-system-packages -y sglang
RUN rm -rf sglang
# 0.4.7
RUN pip install --break-system-packages "sglang==0.4.7"
WORKDIR /sgl-workspace
# https://github.com/ai-dynamo/dynamo/pull/1510
ARG DYNAMO_COMMIT="382e3aedc421b3b3abc338062b332b54b5aa8529"
RUN git clone https://github.com/ai-dynamo/dynamo.git && cd dynamo && git checkout ${DYNAMO_COMMIT}
# install dynamo in editable mode
WORKDIR /sgl-workspace/dynamo
# Rust build/dev dependencies
RUN apt update -y && \
apt install --no-install-recommends -y \
build-essential \
protobuf-compiler \
cmake \
libssl-dev \
pkg-config \
clang \
libclang-dev \
git
# Define Rust target based on ARCH_ALT ARG
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.86.0
# Install Rust using RUSTARCH derived from ARCH_ALT
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
# TODO: Add SHA check back based on RUSTARCH
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
ARG CARGO_BUILD_JOBS
# Set CARGO_BUILD_JOBS to 16 if not provided
# This is to prevent cargo from building $(nproc) jobs in parallel,
# which might exceed the number of opened files limit.
ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16}
RUN cargo build --release
RUN mkdir -p deploy/sdk/src/dynamo/sdk/cli/bin
RUN cp target/release/http deploy/sdk/src/dynamo/sdk/cli/bin
RUN cp target/release/llmctl deploy/sdk/src/dynamo/sdk/cli/bin
RUN cp target/release/dynamo-run deploy/sdk/src/dynamo/sdk/cli/bin
RUN cd lib/bindings/python && pip install --break-system-packages -e . && cd ../../..
RUN pip install --break-system-packages -e .
ENV PYTHONPATH=/sgl-workspace/dynamo/components/planner/src
RUN wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/v2.10.24/nats-server-v2.10.24-${ARCH}.deb && \
dpkg -i nats-server-v2.10.24-${ARCH}.deb && rm nats-server-v2.10.24-${ARCH}.deb
ENV ETCD_VERSION="v3.5.18"
RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
COPY examples/sglang/configs/deepep/* /sgl-workspace/dynamo/examples/sglang/configs/
WORKDIR /sgl-workspace/dynamo/examples/sglang
......@@ -73,7 +73,7 @@ dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml
#### Disaggregated
As of `sglang==0.4.6.post4`, SGLang uses a mini load balancer to route requests to handle disaggregated serving. The load balancer functions as follows
SGLang uses a mini load balancer to route requests to handle disaggregated serving. The load balancer functions as follows
1. The load balancer receives a request from the client
2. A random `(prefill, decode)` pair is selected from the pool of available workers
......@@ -99,3 +99,146 @@ SGLang also supports DP attention for MoE models. We provide an example config f
cd /workspace/examples/sglang
dynamo serve graphs.disagg:Frontend -f ./configs/disagg-dp-attention.yaml
```
##### Disaggregated with WideEP
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 2 H100 nodes and 1 decode worker on 4 H100 nodes (48 total GPUs). You can easily scale this to 96 GPUs or more by simply changing the configuration files.
Steps to run:
1. Build the SGLang DeepEP container
```bash
git clone https://github.com/sgl-project/sglang.git
cd sglang/docker
docker build -f Dockerfile.deepep -t deepep .
```
You will now have a `deepep:latest` image
2. Build the Dynamo container
```bash
cd $DYNAMO_ROOT
docker build -f container/Dockerfile.sglang-deepep . -t dynamo-deepep --no-cache
```
3. You can run this container on each 8xH100 node using the following command.
> [!IMPORTANT]
> We recommend downloading DeepSeek-R1 and then mounting it to the container. You can find the model [here](https://huggingface.co/deepseek-ai/DeepSeek-R1)
```bash
docker run \
--gpus all \
-it \
--rm \
--network host \
--volume /PATH_TO_DSR1_MODEL/:/model/ \
--shm-size=10G \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--ulimit nofile=65536:65536 \
--cap-add CAP_SYS_PTRACE \
--ipc host \
dynamo-deepep:latest
```
In each container, you should be in the `/sgl-workspace/dynamo/examples/sglang` directory.
4. On the head prefill node, start `nats-server` and `etcd` using the following commands
```bash
nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://0.0.0.0:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-cluster default=http://HEAD_PREFILL_NODE_IP:2380 &
```
5. On every other node, go ahead and export the `NATS_SERVER` and `ETCD_ENDPOINTS` environment variables
> [!IMPORTANT]
> You will need the IP address of your head prefill node and head decode node for the configuration files
```bash
# run this on every other node
export NATS_SERVER=nats://HEAD_PREFILL_NODE_IP:4222
export ETCD_ENDPOINTS=http://HEAD_PREFILL_NODE_IP:2379
```
6. Configure each configuration file to use the correct `dist-init-addr`, and `node-rank`
Each container contains the configuration file in `configs/dsr1.yaml`. For our example, we will make the following changes:
On the prefill head node, `vim` into the configs and change the following section of the `SGLangWorker`:
```yaml
SGLangWorker:
...
dist-init-addr: HEAD_PREFILL_NODE_IP
nnodes: 2
node-rank: 0
...
```
On the other prefill node (since this example has 2 prefill nodes), change the following section of the `SGLangWorker`:
```yaml
SGLangWorker:
...
dist-init-addr: HEAD_PREFILL_NODE_IP
nnodes: 2
node-rank: 1
...
```
On the decode head node, `vim` into the configs and change the following section of the `SGLangDecodeWorker`:
```yaml
SGLangDecodeWorker:
...
dist-init-addr: HEAD_DECODE_NODE_IP
nnodes: 4
node-rank: 0
...
```
On the other decode nodes (this example has 4 decode nodes), change the following section of the `SGLangDecodeWorker`:
```yaml
SGLangDecodeWorker:
...
dist-init-addr: HEAD_DECODE_NODE_IP
nnodes: 4
# depending on which node this will be 1, 2, and 3
node-rank: 1
```
7. Start up the workers using the following commands
On prefill head node
```bash
dynamo serve graphs.agg:Frontend -f configs/dsr1.yaml
```
On prefill child node
```bash
dynamo serve graphs.agg:Frontend -f configs/dsr1.yaml --service-name SGLangWorker
```
On all decode nodes
```bash
dynamo serve graphs.disagg:Frontend -f configs/dsr1.yaml --service-name SGLangDecodeWorker
```
8. Run the warmup script to warm up the model
DeepGEMM kernels can sometimes take a while to warm up. Here we provide a small helper script that should help. You can run this as many times as you want before starting inference/benchmarking. You can exec into the head node and run this script standalone - it does not need a container.
```bash
./warmup.sh HEAD_PREFILL_NODE_IP
```
{
"normal_dispatch": {
"num_sms": 24,
"num_max_nvl_chunked_send_tokens": 12,
"num_max_nvl_chunked_recv_tokens": 512,
"num_max_rdma_chunked_send_tokens": 8,
"num_max_rdma_chunked_recv_tokens": 128
},
"normal_combine": {
"num_sms": 24,
"num_max_nvl_chunked_send_tokens": 1,
"num_max_nvl_chunked_recv_tokens": 512,
"num_max_rdma_chunked_send_tokens": 8,
"num_max_rdma_chunked_recv_tokens": 128
}
}
\ No newline at end of file
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Frontend:
served_model_name: deepseek-ai/DeepSeek-R1
endpoint: dynamo.SGLangWorker.generate
port: 8000
SGLangWorker:
model-path: /model/
served-model-name: deepseek-ai/DeepSeek-R1
skip-tokenizer-init: true
disaggregation-mode: prefill
disaggregation-transfer-backend: nixl
disaggregation-bootstrap-port: 30001
dist-init-addr: HEAD_PREFILL_NODE_IP:29500
nnodes: 2
node-rank: 0
tp-size: 16
dp-size: 16
enable-dp-attention: true
decode-log-interval: 1
# when MoE is enabled ep-size == tp-size
enable-deepep-moe: true
page-size: 1
trust-remote-code: true
moe-dense-tp-size: 1
enable-dp-lm-head: true
disable-radix-cache: true
watchdog-timeout: 1000000
enable-two-batch-overlap: true
deepep-mode: normal
mem-fraction-static: 0.85
# SGLang's instructions for benchmarking include these flags
#max-running-requests: 8192
#max-total-tokens: 131072
#context-length: 8192
#init-expert-location: /configs/prefill_in4096.json
#deepep-config: /configs/deepep.json
chunked-prefill-size: 524288
ep-num-redundant-experts: 32
ep-dispatch-algorithm: dynamic
eplb-algorithm: deepseek
ServiceArgs:
workers: 1
resources:
gpu: 8
envs:
MC_TE_METRIC: true
SGLANG_TBO_DEBUG: 1
SGLangDecodeWorker:
model-path: /model/
served-model-name: deepseek-ai/DeepSeek-R1
skip-tokenizer-init: true
disaggregation-mode: decode
disaggregation-transfer-backend: nixl
disaggregation-bootstrap-port: 30001
dist-init-addr: HEAD_DECODE_NODE_IP:29500
nnodes: 4
node-rank: 0
tp-size: 32
dp-size: 32
enable-dp-attention: true
decode-log-interval: 1
enable-deepep-moe: true
page-size: 1
trust-remote-code: true
# when MoE is enabled ep-size == tp-size
moe-dense-tp-size: 1
enable-dp-lm-head: true
disable-radix-cache: true
watchdog-timeout: 1000000
enable-two-batch-overlap: true
deepep-mode: low_latency
mem-fraction-static: 0.835
# SGLang's instructions for benchmarking include these flags
#max-running-requests: 18432
#context-length: 4500
#init-expert-location: /configs/decode_in2000out100.json
ep-num-redundant-experts: 32
cuda-graph-bs: 256
ServiceArgs:
workers: 1
resources:
gpu: 8
envs:
MC_TE_METRIC: true
SGLANG_TBO_DEBUG: 1
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/bin/bash
if [ $# -lt 1 ]; then
echo "Usage: $0 <ip> [port]"
echo "port defaults to 8000 if not specified"
exit 1
fi
IP=$1
PORT=${2:-8000}
echo "Running initial warmup 5 times with 5 seconds between each request"
for i in {1..5}; do
echo "Running iteration $i..."
curl ${IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the worldIn the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for"
}
],
"stream":true,
"max_tokens": 100
}'
echo "Sleeping for 5 seconds..."
sleep 5
done
echo "Increasing output length to 500 tokens and running same request 10 times"
for i in {1..10}; do
echo "Running iteration $i..."
curl ${IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the worldIn the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for"
}
],
"stream":true,
"max_tokens": 500
}'
echo "Sleeping for 5 seconds..."
sleep 5
done
echo "Running 5 parallel requests with 500 tokens each"
for i in {1..5}; do
curl ${IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the worldIn the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden.Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for"
}
],
"stream":true,
"max_tokens": 1000
}' &
done
wait
echo "Parallel requests complete"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment