fix(recipes): correct GPU counts in DeepSeek-R1 READMEs (#5953)

03eb296e · Ben Hamm · GitHub · fce8bbc2 · 03eb296e · 03eb296e
Unverified Commit 03eb296e authored Feb 06, 2026 by Ben Hamm Committed by GitHub Feb 06, 2026
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 8 deletions

recipes/deepseek-r1/sglang/README.md recipes/deepseek-r1/sglang/README.md +1 -1

recipes/deepseek-r1/vllm/disagg/README.md recipes/deepseek-r1/vllm/disagg/README.md +7 -7

No files found.
--- a/recipes/deepseek-r1/sglang/README.md
+++ b/recipes/deepseek-r1/sglang/README.md
@@ -14,7 +14,7 @@ Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) a

 ## Hardware

-The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.
+The two deployment recipes are for 16x H200 (disagg-8gpu) and 32x H200 (disagg-16gpu). The folder names refer to GPUs per worker type (8 or 16), with separate prefill and decode workers each using that many GPUs. It should also work for other GPU SKUs. Change the TP and EP size accordingly to match the GPU capacity.

 If you see NCCL errors when sending requests to the engines, it is usually caused by OOM error. Try to reduce `--mem-fraction-static` in both prefill and decode engines.

--- a/recipes/deepseek-r1/vllm/disagg/README.md
+++ b/recipes/deepseek-r1/vllm/disagg/README.md
@@ -3,19 +3,19 @@ SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES.
 SPDX-License-Identifier: Apache-2.0
 -->

-### DeepSeek-R1 with vLLM — Disaggregated on 8x Hopper
+### DeepSeek-R1 with vLLM — Disaggregated on 32x Hopper

-This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode setup on a single Hopper node with 8 GPUs.
+This recipe deploys DeepSeek-R1 using vLLM in a disaggregated prefill/decode setup across four Hopper nodes (32 GPUs total: 16 for prefill, 16 for decode).

 - Model cache PVC + download job: `recipes/deepseek-r1/model-cache/`
- Deployment manifest: `recipes/deepseek-r1/vllm/disagg/deploy_hopper_8gpu.yaml`
+- Deployment manifest: `recipes/deepseek-r1/vllm/disagg/deploy_hopper_16gpu.yaml`

 ### 0) Prerequisites: Install the platform

 Follow the Kubernetes deployment guide to install the Dynamo platform and prerequisites (CRDs/operator, etc.):
 - `docs/kubernetes/README.md`

-Ensure you have a GPU-enabled cluster with sufficient capacity (8x H100/H200 “Hopper”), and that the NVIDIA GPU Operator is healthy.
+Ensure you have a GPU-enabled cluster with sufficient capacity (32x H100/H200 "Hopper" across 4 nodes), and that the NVIDIA GPU Operator is healthy.

 ### 1) Set namespace

@@ -58,15 +58,15 @@ This will populate:
 - `/model-cache/deepseek-r1`
 - `/model-cache/deepseek-r1-fp4`

-### 4) Deploy vLLM (Disaggregated, Prefill DEP16, Decode DEP16)
+### 4) Deploy vLLM (Disaggregated, 16-way Data-Expert Parallel)

-Apply the single-node disaggregated deployment:
+Apply the multi-node disaggregated deployment:

 ```bash
 kubectl apply -f ./deploy_hopper_16gpu.yaml -n ${NAMESPACE}
 ```

-The manifest runs separate prefill and decode workers, each mounting the shared model cache, with settings tuned for Hopper.
+The manifest runs separate prefill and decode workers across multiple nodes, each mounting the shared model cache, with settings tuned for Hopper GPUs.

 Test the deployment locally by port-forwarding and sending a request: