"vllm/vscode:/vscode.git/clone" did not exist on "4fe58953611ede752e34b67ae785fed28be66465"
Unverified Commit a294dbe8 authored by Hongkuan Zhou's avatar Hongkuan Zhou Committed by GitHub
Browse files

fix: sglang dsr1 recipe pvc path (#5119)


Signed-off-by: default avatarhongkuanz <hongkuanz@nvidia.com>
parent 0b33c1df
...@@ -29,7 +29,7 @@ spec: ...@@ -29,7 +29,7 @@ spec:
# Reference to ConfigMap containing the DGD base config # Reference to ConfigMap containing the DGD base config
# For MoE models, this should point to the appropriate disagg config # For MoE models, this should point to the appropriate disagg config
# Original path: /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu.yaml # Original path: /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
configMapRef: configMapRef:
name: deepseek-r1-config name: deepseek-r1-config
key: tep16p-dep16d-disagg.yaml key: tep16p-dep16d-disagg.yaml
......
...@@ -16,10 +16,12 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D ...@@ -16,10 +16,12 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | ❌ | | **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | ❌ |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | ❌ | | **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | ❌ |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | ❌ | | **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | ❌ |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ | | **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅*1 | ❌ | Benchmark recipe pending | ❌ |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ | | **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅*1 | ❌ | Benchmark recipe pending | ❌ |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ |Multi-node: 8 decode + 1 prefill nodes | ❌ | | **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ |Multi-node: 8 decode + 1 prefill nodes | ❌ |
*1: Please use `deepseek-r1/model-cache/model-download-sglang.yaml` to download the model into the PVC.
**Legend:** **Legend:**
- **Deployment**: ✅ = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete - **Deployment**: ✅ = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete
- **Benchmark Recipe**: ✅ = Includes `perf.yaml` for running AIPerf benchmarks | ❌ = No benchmark recipe provided - **Benchmark Recipe**: ✅ = Includes `perf.yaml` for running AIPerf benchmarks | ❌ = No benchmark recipe provided
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: batch/v1
kind: Job
metadata:
name: model-download
spec:
backoffLimit: 3
completions: 1
parallelism: 1
template:
metadata:
labels:
app: model-download
spec:
restartPolicy: Never
tolerations: []
containers:
- name: model-download
image: python:3.10-slim
command: ["sh", "-c"]
env:
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "1"
- name: HF_HOME
value: /opt/model-cache
args:
- |
set -eux
pip install --no-cache-dir huggingface_hub hf_transfer
hf download deepseek-ai/DeepSeek-R1
volumeMounts:
- name: model-cache
mountPath: /opt/model-cache
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: model-cache
\ No newline at end of file
...@@ -17,6 +17,9 @@ spec: ...@@ -17,6 +17,9 @@ spec:
dynamoNamespace: sgl-dsr1-16gpu dynamoNamespace: sgl-dsr1-16gpu
componentType: frontend componentType: frontend
replicas: 1 replicas: 1
volumeMounts:
- name: model-cache
mountPoint: /opt/model
extraPodSpec: extraPodSpec:
mainContainer: mainContainer:
image: my-registry/sglang-runtime:my-tag image: my-registry/sglang-runtime:my-tag
......
...@@ -17,6 +17,9 @@ spec: ...@@ -17,6 +17,9 @@ spec:
dynamoNamespace: sgl-dsr1-8gpu dynamoNamespace: sgl-dsr1-8gpu
componentType: frontend componentType: frontend
replicas: 1 replicas: 1
volumeMounts:
- name: model-cache
mountPoint: /opt/model
extraPodSpec: extraPodSpec:
mainContainer: mainContainer:
image: my-registry/sglang-runtime:my-tag image: my-registry/sglang-runtime:my-tag
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment