Unverified Commit 48c340f4 authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

fix: point to new sglang container and rm references to old (#4383)

parent cf64ca7c
......@@ -35,7 +35,6 @@ vllm: &vllm
sglang: &sglang
- 'container/Dockerfile.sglang'
- 'container/Dockerfile.sglang-wideep'
- 'examples/backends/sglang/**'
- 'components/src/dynamo/sglang/**'
- 'container/build.sh'
......
......@@ -382,6 +382,7 @@ COPY --chown=dynamo: examples /workspace/examples
COPY --chown=dynamo: benchmarks /workspace/benchmarks
COPY --chown=dynamo: deploy /workspace/deploy
COPY --chown=dynamo: components/ /workspace/components/
COPY --chown=dynamo: recipes/ /workspace/recipes/
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
......
......@@ -135,11 +135,9 @@ We are in the process of shipping pre-built docker containers that contain insta
```bash
cd $DYNAMO_ROOT
docker build \
-f container/Dockerfile.sglang-wideep \
-t dynamo-sglang \
--no-cache \
.
./container/build.sh \
--framework SGLANG \
--tag dynamo-sglang:latest \
```
And then run it using
......
......@@ -5,26 +5,21 @@ SPDX-License-Identifier: Apache-2.0
# Running DeepSeek-R1 Disaggregated with WideEP on GB200s
Dynamo supports SGLang's GB200 implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-06-16-gb200-part-1/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and a sample configuration that demonstrates WideEP and P/D disaggregation. To run the exact configuration shown in the blog post, you can view the commands created by the SGLang team [here](https://github.com/sgl-project/sglang/issues/7227). In this example, we will run 1 prefill worker on 2 GB200 nodes (4 GPUs each) and 1 decode worker on 2 GB200 nodes (total 8 GPUs).
Dynamo supports SGLang's GB200 implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-06-16-gb200-part-1/) for more details. We provide a sample configuration that demonstrates WideEP and P/D disaggregation. To run the exact configuration shown in the blog post, you can view the commands created by the SGLang team [here](https://github.com/sgl-project/sglang/issues/7227). In this example, we will run 1 prefill worker on 2 GB200 nodes (4 GPUs each) and 1 decode worker on 2 GB200 nodes (total 8 GPUs).
## Instructions
1. Build the Dynamo container using the latest published dynamo version and stable sglang version. If you want to build from a local dynamo repo, you can add `--build-arg BRANCH_TYPE=local` to the build command. If you want to build from a remote dynamo repo, you can add `--build-arg BRANCH_TYPE=remote` to the build command. If you want to use a specific tag for the default sglang version, you can add `--build-arg SGLANG_IMAGE_TAG=<tag>` to the build command.
1. Build the Dynamo container for ARM64 (GB200) using the `build.sh` script.
> [!Note]
> Please ensure that you are building this on an ARM64 machine. The correct SGLang image will be selected automatically via the multi-arch manifest.
> [!Note]
> Please use `--build-arg SGLANG_IMAGE_TAG=nightly-dev-20251019-fda0cb2a` to build the container due to a bug that we found with the DeepEP version being installed. This was fixed in [PR 11773](https://github.com/sgl-project/sglang/pull/11773). When SGLang releases a version > `0.5.3.post3` we will update these instructions.
> Please ensure that you are building this on an ARM64 machine. The build script will automatically configure the correct platform and build arguments for SGLang on ARM64/GB200.
```bash
cd $DYNAMO_ROOT
docker build \
-f container/Dockerfile.sglang-wideep \
-t dynamo-wideep-gb200 \
--build-arg SGLANG_IMAGE_TAG=nightly-dev-20251019-fda0cb2a \
--no-cache \
.
./container/build.sh \
--framework SGLANG \
--platform linux/arm64 \
--tag dynamo-wideep-gb200:latest
```
2. You can run this container on each 4xGB200 node using the following command.
......@@ -177,4 +172,4 @@ python3 -m dynamo.sglang \
--disaggregation-transfer-backend nixl
```
On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1.
\ No newline at end of file
On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1.
......@@ -5,22 +5,20 @@ SPDX-License-Identifier: Apache-2.0
# Running DeepSeek-R1 Disaggregated with WideEP on H100s
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and a sample configuration that demonstrates WideEP and P/D disaggregation. To run the exact configuration shown in the blog post, you can view the commands created by the SGLang team [here](https://github.com/sgl-project/sglang/issues/6017). In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 4 H100 nodes (64 total GPUs).
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a sample configuration that demonstrates WideEP and P/D disaggregation. To run the exact configuration shown in the blog post, you can view the commands created by the SGLang team [here](https://github.com/sgl-project/sglang/issues/6017). In this example, we will run 1 prefill worker on 4 H100 nodes (32 GPUs each) and 1 decode worker on 4 H100 nodes (total 64 GPUs).
## Instructions
1. Build the Dynamo container using the latest published dynamo version and stable sglang version. If you want to build from a local dynamo repo, you can add `--build-arg BRANCH_TYPE=local` to the build command. If you want to build from a remote dynamo repo, you can add `--build-arg BRANCH_TYPE=remote` to the build command. If you want to use a specific tag for the default sglang version, you can add `--build-arg SGLANG_IMAGE_TAG=<tag>` to the build command.
1. Build the Dynamo container for AMD64/x86_64 (H100) using the `build.sh` script.
> [!Note]
> Please ensure that you are building this on an AMD64 (x86_64) machine. The correct SGLang image will be selected automatically via the multi-arch manifest.
> Please ensure that you are building this on an AMD64 (x86_64) machine. The build script will automatically configure the correct platform for SGLang.
```bash
cd $DYNAMO_ROOT
docker build \
-f container/Dockerfile.sglang-wideep \
-t dynamo-wideep \
--no-cache \
.
./container/build.sh \
--framework SGLANG \
--tag dynamo-wideep:latest \
```
2. You can run this container on each 8xH100 node using the following command.
......
......@@ -13,7 +13,9 @@ SGLang allows you to deploy multi-node sized models by adding in the `dist-init-
```bash
cd $DYNAMO_ROOT
docker build -f container/Dockerfile.sglang-wideep . -t dynamo-wideep --no-cache
./container/build.sh \
--framework SGLANG \
--tag dynamo-wideep:latest \
```
You can use a specific tag from the [lmsys dockerhub](https://hub.docker.com/r/lmsysorg/sglang/tags) by adding `--build-arg SGLANG_IMAGE_TAG=<tag>` to the build command.
......
......@@ -4,10 +4,10 @@ This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is
## Container
Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
Build the container using the `build.sh` script:
```bash
./container/build.sh --framework sglang-wideep
./container/build.sh --framework SGLANG
```
Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment