fix: sglang dsr1 recipe pvc path (#5119)

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

fix: sglang dsr1 recipe pvc path (#5119)
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
a294dbe8 · Hongkuan Zhou · GitHub · 0b33c1df · a294dbe8 · a294dbe8
Unverified Commit a294dbe8 authored Dec 31, 2025 by Hongkuan Zhou Committed by GitHub Dec 31, 2025
5 changed files
--- a/benchmarks/profiler/deploy/profile_sla_moe_dgdr.yaml
+++ b/benchmarks/profiler/deploy/profile_sla_moe_dgdr.yaml
@@ -29,7 +29,7 @@ spec:
    # Reference to ConfigMap containing the DGD base config
    # For MoE models, this should point to the appropriate disagg config
-    # Original path: /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu.yaml
+    # Original path: /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
    configMapRef:
      name: deepseek-r1-config
      key: tep16p-dep16d-disagg.yaml

--- a/recipes/README.md
+++ b/recipes/README.md
@@ -16,10 +16,12 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D
 | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | ❌ |
 | **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | ❌ |
 | **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | ❌ |
-| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ |
+| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅*1 | ❌ | Benchmark recipe pending | ❌ |
-| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ |
+| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅*1 | ❌ | Benchmark recipe pending | ❌ |
 | **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ |Multi-node: 8 decode + 1 prefill nodes | ❌ |
+*1: Please use `deepseek-r1/model-cache/model-download-sglang.yaml` to download the model into the PVC.
 **Legend:**
 - **Deployment**: ✅ = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete
 - **Benchmark Recipe**: ✅ = Includes `perf.yaml` for running AIPerf benchmarks | ❌ = No benchmark recipe provided

--- a/recipes/deepseek-r1/model-cache/model-download-sglang.yaml
+++ b/recipes/deepseek-r1/model-cache/model-download-sglang.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: model-download
+spec:
+  backoffLimit: 3
+  completions: 1
+  parallelism: 1
+  template:
+    metadata:
+      labels:
+        app: model-download
+    spec:
+      restartPolicy: Never
+      tolerations: []
+      containers:
+        - name: model-download
+          image: python:3.10-slim
+          command: ["sh", "-c"]
+          env:
+            - name: HF_HUB_ENABLE_HF_TRANSFER
+              value: "1"
+            - name: HF_HOME
+              value: /opt/model-cache
+          args:
+            - |
+              set -eux
+              pip install --no-cache-dir huggingface_hub hf_transfer
+              hf download deepseek-ai/DeepSeek-R1
+          volumeMounts:
+            - name: model-cache
+              mountPath: /opt/model-cache
+      volumes:
+      - name: model-cache
+        persistentVolumeClaim:
+          claimName: model-cache
\ No newline at end of file
--- a/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
+++ b/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
@@ -17,6 +17,9 @@ spec:
      dynamoNamespace: sgl-dsr1-16gpu
      componentType: frontend
      replicas: 1
+      volumeMounts:
+        - name: model-cache
+          mountPoint: /opt/model
      extraPodSpec:
        mainContainer:
          image: my-registry/sglang-runtime:my-tag

--- a/recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml
+++ b/recipes/deepseek-r1/sglang/disagg-8gpu/deploy.yaml
@@ -17,6 +17,9 @@ spec:
      dynamoNamespace: sgl-dsr1-8gpu
      componentType: frontend
      replicas: 1
+      volumeMounts:
+        - name: model-cache
+          mountPoint: /opt/model
      extraPodSpec:
        mainContainer:
          image: my-registry/sglang-runtime:my-tag