fix: consistent model recipes and update simplified doc (#3858)

0757ecf1 · Biswa Panda · GitHub · c2dc8557 · 0757ecf1 · 0757ecf1
Unverified Commit 0757ecf1 authored Oct 27, 2025 by Biswa Panda Committed by GitHub Oct 27, 2025
15 changed files
--- a/benchmarks/profiler/deploy/profile_sla_moe_job.yaml
+++ b/benchmarks/profiler/deploy/profile_sla_moe_job.yaml
@@ -31,7 +31,7 @@ spec:
        command: ["python", "-m", "benchmarks.profiler.profile_sla"]
        args:
          - --config
-          - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
+          - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
          - --output-dir
          - /data/profiling_results
          - --namespace

--- a/recipes/CONTRIBUTING.md
+++ b/recipes/CONTRIBUTING.md
+#  Recipes Contributing Guide
+When adding new model recipes, ensure they follow the standard structure:
+```text
+<model-name>/
+├── model-cache/
+│   ├── model-cache.yaml
+│   └── model-download.yaml
+├── <framework>/
+│   └── <deployment-mode>/
+│       ├── deploy.yaml
+│       └── perf.yaml (optional)
+└── README.md (optional)
+```
+## Validation
+The `run.sh` script expects this exact directory structure and will validate that the directories and files exist before deployment:
+- Model directory exists in `recipes/<model>/`
+- Framework is one of the supported frameworks (vllm, sglang, trtllm)
+- Framework directory exists in `recipes/<model>/<framework>/`
+- Deployment directory exists in `recipes/<model>/<framework>/<deployment>/`
+- Required files (`deploy.yaml`) exist in the deployment directory
+- If present, performance benchmarks (`perf.yaml`) will be automatically executed
\ No newline at end of file
--- a/recipes/README.md
+++ b/recipes/README.md
-# Dynamo model serving recipes
+# Dynamo Model Serving Recipes
-| Model family  | Backend | Mode                | GPU   | Deployment | Benchmark |
+This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
-|---------------|---------|---------------------|-------|------------|-----------|
-| llama-3-70b   | vllm    | agg                 | H100, H200  |     ✓      |     ✓     |
+## Contents
-| llama-3-70b   | vllm    | disagg-multi-node   | H100, H200  |     ✓      |     ✓     |
+- [Available Models](#available-models)
-| llama-3-70b   | vllm    | disagg-single-node  | H100, H200  |     ✓      |     ✓     |
+- [Quick Start](#quick-start)
-| DeepSeek-R1   | sglang  | disaggregated       | H200  |     ✓      |    🚧     |
+- [Prerequisites](#prerequisites)
-| oss-gpt       | trtllm  | aggregated          | GB200 |     ✓      |     ✓     |
+- Deployment Methods
+   - [Option 1: Automated Deployment](#option-1-automated-deployment)
+   - [Option 2: Manual Deployment](#option-2-manual-deployment)
+## Available Models
+| Model Family    | Framework | Deployment Mode      | GPU Requirements | Status | Benchmark |
+|-----------------|-----------|---------------------|------------------|--------|-----------|
+| llama-3-70b     | vllm      | agg                 | 4x H100/H200     | ✅     | ✅        |
+| llama-3-70b     | vllm      | disagg (1 node)      | 8x H100/H200    | ✅     | ✅        |
+| llama-3-70b     | vllm      | disagg (multi-node)     | 16x H100/H200    | ✅     | ✅        |
+| deepseek-r1     | sglang    | disagg (1 node, wide-ep)     | 8x H200          | ✅     | 🚧        |
+| deepseek-r1     | sglang    | disagg (multi-node, wide-ep)     | 16x H200        | ✅     | 🚧        |
+| gpt-oss-120b    | trtllm    | agg                 | 4x GB200         | ✅     | ✅        |
+**Legend:**
+- ✅ Functional
+- 🚧 Under development
+**Recipe Directory Structure:**
+Recipes are organized into a directory structure that follows the pattern:
+```text
+<model-name>/
+├── model-cache/
+│   ├── model-cache.yaml         # PVC for model cache
+│   └── model-download.yaml      # Job for model download
+├── <framework>/
+│   └── <deployment-mode>/
+│       ├── deploy.yaml          # DynamoGraphDeployment CRD and optional configmap for custom configuration
+│       └── perf.yaml (optional) # Performance benchmark
+└── README.md (optional)         # Model documentation
+```
+## Quick Start
+Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment.
+Choose your preferred deployment method: using the `run.sh` script or manual deployment steps.
 ## Prerequisites
-1. Create a namespace and populate NAMESPACE environment variable
+### 1. Environment Setup
-This environment variable is used in later steps to deploy and perf-test the model.
+Create a Kubernetes namespace and set environment variable:
 ```bash
 export NAMESPACE=your-namespace
 kubectl create namespace ${NAMESPACE}
 ```
-2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md)
+### 2. Deploy Dynamo Platform
+Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md).
+### 3. GPU Cluster
+Ensure your Kubernetes cluster has:
+- GPU nodes with appropriate GPU types (see model requirements above)
+- GPU operator installed
+- Sufficient GPU memory and compute resources
+### 4. Container Registry Access
-3. **Kubernetes cluster with GPU support**
+Ensure access to NVIDIA container registry for runtime images:
+- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z`
+- `nvcr.io/nvidia/ai-dynamo/trtllm-runtime:x.y.z`
+- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z`
-4. **Container registry access** for vLLM runtime images
+### 5. HuggingFace Access and Kubernetes Secret Creation
-5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
+Set up a kubernetes secret with the HuggingFace token for model download:
-Update the `hf-token-secret.yaml` file with your HuggingFace token.
 ```bash
+# Update the token in the secret file
+vim hf_hub_secret/hf_hub_secret.yaml
+# Apply the secret
 kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
 ```
-6. (Optional) Create a shared model cache pvc to store the model weights.
+### 6. Configure Storage Class
-Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
+Configure persistent storage for model caching:
 ```bash
+# Check available storage classes
 kubectl get storageclass
 ```
-## Running the recipes
+Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
+```yaml
+# In <model>/model-cache/model-cache.yaml
+spec:
+  storageClassName: "your-actual-storage-class"  # Replace this
+```
+## Option 1: Automated Deployment
-Run the recipe to deploy a model:
+Use the `run.sh` script for fully automated deployment:
+**Note:** The script automatically:
+- Create model cache PVC and downloads the model
+- Deploy the model service
+- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
+#### Script Usage
 ```bash
-./run.sh --model <model> --framework <framework> <deployment-type>
+./run.sh [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>
 ```
-Arguments:
+**Required Options:**
-  <deployment-type>  Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)
+- `--model <model>`: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1)
+- `--framework <framework>`: Backend framework (`vllm`, `trtllm`, `sglang`)
+- `--deployment <deployment-type>`: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node)
+**Optional Options:**
+- `--namespace <namespace>`: Kubernetes namespace (default: dynamo)
+- `--dry-run`: Show commands without executing them
+- `-h, --help`: Show help message
+**Environment Variables:**
+- `NAMESPACE`: Kubernetes namespace (default: dynamo)
+#### Example Usage
+```bash
+# Set up environment
+export NAMESPACE=your-namespace
+kubectl create namespace ${NAMESPACE}
+# Configure HuggingFace token
+kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
+# use run.sh script to deploy the model
+# Deploy Llama-3-70B with vLLM (aggregated mode)
+./run.sh --model llama-3-70b --framework vllm --deployment agg
+# Deploy GPT-OSS-120B with TensorRT-LLM
+./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
+# Deploy DeepSeek-R1 with SGLang (disaggregated mode)
+./run.sh --model deepseek-r1 --framework sglang --deployment disagg
+# Deploy with custom namespace
+./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg
+# Dry run to see what would be executed
+./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg
+```
-Required Options:
-  --model <model>    Model name (e.g., llama-3-70b)
-  --framework <fw>   Framework one of VLLM TRTLLM SGLANG (default: VLLM)
-Optional:
+## Option 2: Manual Deployment
-  --skip-model-cache Skip model downloading (assumes model cache already exists)
-  -h, --help         Show this help message
-Environment Variables:
+For step-by-step manual deployment follow these steps :
-  NAMESPACE          Kubernetes namespace (default: dynamo)
-Examples:
-  ./run.sh --model llama-3-70b --framework vllm agg
-  ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg
-  ./run.sh --model llama-3-70b --framework trtllm disagg-single-node
-Example:
 ```bash
-./run.sh --model llama-3-70b --framework vllm --deployment-type agg
+# 0. Set up environment (see Prerequisites section)
+export NAMESPACE=your-namespace
+kubectl create namespace ${NAMESPACE}
+kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
+# 1. Download model (see Model Download section)
+kubectl apply -n $NAMESPACE -f <model>/model-cache/
+# 2. Deploy model (see Deployment section)
+kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/deploy.yaml
+# 3. Run benchmarks (optional, if perf.yaml exists)
+kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml
+```
+### Step 1: Download Model
+```bash
+# Start the download job
+kubectl apply -n $NAMESPACE -f <model>/model-cache
+# Verify job creation
+kubectl get jobs -n $NAMESPACE | grep model-download
+```
+Monitor and wait for the model download to complete:
+```bash
+# Wait for job completion (timeout after 100 minutes)
+kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
+# Check job status
+kubectl get job model-download -n $NAMESPACE
+# View download logs
+kubectl logs job/model-download -n $NAMESPACE
+```
+### Step 2: Deploy Model Service
+```bash
+# Navigate to the specific deployment configuration
+cd <model>/<framework>/<deployment-mode>/
+# Deploy the model service
+kubectl apply -n $NAMESPACE -f deploy.yaml
+# Verify deployment creation
+kubectl get deployments -n $NAMESPACE
 ```
+#### Wait for Deployment Ready
-## Dry run mode
+```bash
+# Get deployment name from the deploy.yaml file
+DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}')
+# Wait for deployment to be ready (timeout after 10 minutes)
+kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s
+# Check deployment status
+kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE
+# Check pod status
+kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME
+```
+#### Verify Model Service
+```bash
+# Check if service is running
+kubectl get services -n $NAMESPACE
+# Test model endpoint (port-forward to test locally)
+kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE
+# Test the model API (in another terminal)
+curl http://localhost:8000/v1/models
-To dry run the recipe, add the `--dry-run` flag.
+# Stop port-forward when done
+pkill -f "kubectl port-forward"
+```
+### Step 3: Performance Benchmarking (Optional)
+Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
+#### Launch Benchmark Job
 ```bash
-./run.sh --dry-run --model llama-3-70b --framework vllm agg
+# From the deployment directory
+kubectl apply -n $NAMESPACE -f perf.yaml
+# Verify benchmark job creation
+kubectl get jobs -n $NAMESPACE
 ```
-## (Optional) Running the recipes with model cache
+#### Monitor Benchmark Progress
-You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights.
- See the [Prerequisites](#prerequisites) section for more details.
 ```bash
-./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache
+# Get benchmark job name
+PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}')
+# Monitor benchmark logs in real-time
+kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE
+# Wait for benchmark completion (timeout after 100 minutes)
+kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s
 ```
+#### View Benchmark Results
+```bash
+# Check final benchmark results
+kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50
+```
\ No newline at end of file
--- a/recipes/deepseek-r1/model_cache/model-cache.yaml
+++ b/recipes/deepseek-r1/model_cache/model-cache.yaml
--- a/recipes/deepseek-r1/model-cache/model-download.yaml
+++ b/recipes/deepseek-r1/model-cache/model-download.yaml
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: model-download
+spec:
+  backoffLimit: 3
+  completions: 1
+  parallelism: 1
+  template:
+    metadata:
+      labels:
+        app: model-download
+    spec:
+      restartPolicy: Never
+      containers:
+        - name: model-download
+          image: python:3.10-slim
+          command: ["sh", "-c"]
+          envFrom:
+            - secretRef:
+                name: hf-token-secret
+          env:
+            - name: MODEL_NAME
+              value: deepseek-ai/DeepSeek-R1
+            - name: HF_HOME
+              value: /model-store
+            - name: HF_HUB_ENABLE_HF_TRANSFER
+              value: "1"
+            - name: MODEL_REVISION
+              value: 56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad
+          args:
+            - |
+              set -eux
+              pip install --no-cache-dir huggingface_hub hf_transfer
+              hf download $MODEL_NAME --revision $MODEL_REVISION
+          volumeMounts:
+            - name: model-cache
+              mountPath: /model-store
+      volumes:
+      - name: model-cache
+        persistentVolumeClaim:
+          claimName: model-cache
\ No newline at end of file
--- a/recipes/deepseek-r1/model_cache/model-download.yaml
+++ b/recipes/deepseek-r1/model_cache/model-download.yaml
--- a/recipes/deepseek-r1/sglang-wideep/README.md
+++ b/recipes/deepseek-r1/sglang-wideep/README.md
-# Container
+# DeepSeek R1 SGLang Recipe
+This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is based on the WideEP recipe from the SGLang team.
+## Container
 Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe
 Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
-# Hardware
+## Hardware
 The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.

--- a/recipes/deepseek-r1/sglang-wideep/deepep.json
+++ b/recipes/deepseek-r1/sglang-wideep/deepep.json
--- a/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
+++ b/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
--- a/recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
+++ b/recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
--- a/recipes/gpt-oss-120b/trtllm/agg/config.yaml
+++ b/recipes/gpt-oss-120b/trtllm/agg/config.yaml
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  name: llm-config
-data:
-  config.yaml: |
-    enable_attention_dp: true
-    cuda_graph_config:
-        max_batch_size: 800
-        enable_padding: true
-    kv_cache_config:
-      enable_block_reuse: false
-    stream_interval: 20
-    moe_config:
-        backend: CUTLASS
\ No newline at end of file
--- a/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
+++ b/recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
 # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: llm-config
+data:
+  config.yaml: |
+    enable_attention_dp: true
+    cuda_graph_config:
+        max_batch_size: 800
+        enable_padding: true
+    kv_cache_config:
+      enable_block_reuse: false
+    stream_interval: 20
+    moe_config:
+        backend: CUTLASS
+---
 apiVersion: nvidia.com/v1alpha1
 kind: DynamoGraphDeployment
 metadata:
@@ -7,7 +23,7 @@ metadata:
 spec:
  backendFramework: trtllm
  pvcs:
-    - name: model-cache-oss-gpt120b
+    - name: model-cache
      create: false
  services:
    Frontend:
@@ -31,17 +47,13 @@ spec:
          - /bin/sh
          - -c
          image: my-registry/trtllm-runtime:my-tag
-      pvc:
-        create: false
-        mountPoint: /model-store
-        name: model-cache
      replicas: 1
    TrtllmWorker:
      componentType: main
      dynamoNamespace: gpt-oss-agg
      envFromSecret: hf-token-secret
      volumeMounts:
-        - name: model-cache-oss-gpt120b
+        - name: model-cache
          mountPoint: /root/.cache/huggingface
      sharedMemory:
        size: 80Gi
@@ -90,10 +102,6 @@ spec:
        - configMap:
            name: llm-config
          name: llm-config
-      pvc:
-        create: false
-        mountPoint: /model-store
-        name: model-cache
      replicas: 1
      resources:
        limits:

--- a/recipes/gpt-oss-120b/trtllm/agg/perf.yaml
+++ b/recipes/gpt-oss-120b/trtllm/agg/perf.yaml
@@ -3,7 +3,7 @@
 apiVersion: batch/v1
 kind: Job
 metadata:
-  name: oss-gpt120b-bench
+  name: gpt-oss-120b-bench
 spec:
  backoffLimit: 1
  completions: 1
@@ -11,7 +11,7 @@ spec:
  template:
    metadata:
      labels:
-        app: oss-gpt120b-bench
+        app: gpt-oss-120b-bench
    spec:
      affinity:
        podAntiAffinity:

--- a/recipes/run.sh
+++ b/recipes/run.sh
@@ -17,8 +17,7 @@
 RECIPES_DIR="$( cd "$( dirname "$0" )" && pwd )"
 # Default values
 NAMESPACE="${NAMESPACE:-dynamo}"
-DOWNLOAD_MODEL=true
+DEPLOYMENT=""
-DEPLOY_TYPE=""
 MODEL=""
 FRAMEWORK=""
 DRY_RUN=""
@@ -29,28 +28,25 @@ DEFAULT_FRAMEWORK=VLLM
 # Function to show usage
 usage() {
-    echo "Usage: $0 [OPTIONS] --model <model> --framework <framework> <deployment-type>"
+    echo "Usage: $0 [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>"
-    echo ""
-    echo "Arguments:"
-    echo "  <deployment-type>  Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)"
    echo ""
    echo "Required Options:"
-    echo "  --model <model>    Model name (e.g., llama-3-70b)"
+    echo "  --model <model>       Model name (e.g., llama-3-70b)"
-    echo "  --framework <fw>   Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})"
+    echo "  --framework <fw>      Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})"
+    echo "  --deployment <type>   Deployment type (e.g., agg, disagg etc, please refer to the README.md for available deployment types)"
    echo ""
    echo "Optional:"
-    echo "  --namespace <ns>   Kubernetes namespace (default: dynamo)"
+    echo "  --namespace <ns>      Kubernetes namespace (default: dynamo)"
-    echo "  --skip-model-cache Skip model downloading (assumes model cache already exists)"
+    echo "  --dry-run             Print commands without executing them"
-    echo "  --dry-run          Print commands without executing them"
+    echo "  -h, --help            Show this help message"
-    echo "  -h, --help         Show this help message"
    echo ""
    echo "Environment Variables:"
-    echo "  NAMESPACE          Kubernetes namespace (default: dynamo)"
+    echo "  NAMESPACE             Kubernetes namespace (default: dynamo)"
    echo ""
    echo "Examples:"
-    echo "  $0 --model llama-3-70b --framework vllm agg"
+    echo "  $0 --model llama-3-70b --framework vllm --deployment agg"
-    echo "  $0 --skip-model-cache --model llama-3-70b --framework vllm agg"
+    echo "  $0 --model llama-3-70b --framework trtllm --deployment disagg-single-node"
-    echo "  $0 --namespace my-ns --model llama-3-70b --framework trtllm disagg-single-node"
+    echo "  $0 --namespace my-ns --model llama-3-70b --framework vllm --deployment disagg-multi-node"
    exit 1
 }
@@ -66,10 +62,6 @@ error() {
 while [[ $# -gt 0 ]]; do
    case $1 in
-        --skip-model-cache)
-            DOWNLOAD_MODEL=false
-            shift
-            ;;
        --dry-run)
            DRY_RUN="echo"
            shift
@@ -90,6 +82,14 @@ while [[ $# -gt 0 ]]; do
                missing_requirement "$1"
            fi
            ;;
+        --deployment)
+            if [ "$2" ]; then
+                DEPLOYMENT=$2
+                shift 2
+            else
+                missing_requirement "$1"
+            fi
+            ;;
        --namespace)
            if [ "$2" ]; then
                NAMESPACE=$2
@@ -105,12 +105,7 @@ while [[ $# -gt 0 ]]; do
            error 'ERROR: Unknown option: ' "$1"
            ;;
        *)
-            if [[ -z "$DEPLOY_TYPE" ]]; then
+            error "ERROR: Unknown argument: " "$1"
-                DEPLOY_TYPE="$1"
-            else
-                error "ERROR: Multiple deployment type arguments provided: " "$1"
-            fi
-            shift
            ;;
    esac
 done
@@ -127,12 +122,12 @@ if [ -n "$FRAMEWORK" ]; then
 fi
 # Validate required arguments
-if [[ -z "$MODEL" ]] || [[ -z "$DEPLOY_TYPE" ]]; then
+if [[ -z "$MODEL" ]] || [[ -z "$DEPLOYMENT" ]]; then
    if [[ -z "$MODEL" ]]; then
        echo "ERROR: --model argument is required"
    fi
-    if [[ -z "$DEPLOY_TYPE" ]]; then
+    if [[ -z "$DEPLOYMENT" ]]; then
-        echo "ERROR: deployment-type argument is required"
+        echo "ERROR: --deployment argument is required"
    fi
    echo ""
    usage
@@ -141,7 +136,7 @@ fi
 # Construct paths based on new structure: recipes/<model>/<framework>/<deployment-type>/
 MODEL_DIR="$RECIPES_DIR/$MODEL"
 FRAMEWORK_DIR="$MODEL_DIR/${FRAMEWORK,,}"
-DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOY_TYPE"
+DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOYMENT"
 # Check if model directory exists
 if [[ ! -d "$MODEL_DIR" ]]; then
@@ -161,7 +156,7 @@ fi
 # Check if deployment directory exists
 if [[ ! -d "$DEPLOY_PATH" ]]; then
-    echo "Error: Deployment type '$DEPLOY_TYPE' does not exist in $FRAMEWORK_DIR"
+    echo "Error: Deployment type '$DEPLOYMENT' does not exist in $FRAMEWORK_DIR"
    echo "Available deployment types for $MODEL/${FRAMEWORK,,}:"
    ls -1 "$FRAMEWORK_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/  /'
    exit 1
@@ -176,9 +171,13 @@ if [[ ! -f "$DEPLOY_FILE" ]]; then
    exit 1
 fi
-if [[ ! -f "$PERF_FILE" ]]; then
+# Check if perf file exists (optional)
-    echo "Error: Performance file '$PERF_FILE' not found"
+PERF_AVAILABLE=false
-    exit 1
+if [[ -f "$PERF_FILE" ]]; then
+    PERF_AVAILABLE=true
+    echo "Performance benchmark file found: $PERF_FILE"
+else
+    echo "Performance benchmark file not found: $PERF_FILE (skipping benchmarks)"
 fi
 # Show deployment information
@@ -187,42 +186,43 @@ echo "Dynamo Recipe Deployment"
 echo "======================================"
 echo "Model: $MODEL"
 echo "Framework: ${FRAMEWORK,,}"
-echo "Deployment Type: $DEPLOY_TYPE"
+echo "Deployment Type: $DEPLOYMENT"
 echo "Namespace: $NAMESPACE"
-echo "Model Download: $DOWNLOAD_MODEL"
 echo "======================================"
 # Handle model downloading
 MODEL_CACHE_DIR="$MODEL_DIR/model-cache"
-if [[ "$DOWNLOAD_MODEL" == "true" ]]; then
+echo "Creating PVC for model cache and downloading model..."
-    echo "Creating PVC for model cache and downloading model..."
+$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
+$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
+# Wait for the model download to complete
-    # Wait for the model download to complete
+MODEL_DOWNLOAD_JOB_NAME=$(grep "name:" $MODEL_CACHE_DIR/model-download.yaml | head -1 | awk '{print $2}')
-    echo "Waiting for the model download to complete..."
+echo "Waiting for job '$MODEL_DOWNLOAD_JOB_NAME' to complete..."
-    $DRY_RUN kubectl wait --for=condition=Complete job/model-download-${MODEL} -n $NAMESPACE --timeout=6000s
+$DRY_RUN kubectl wait --for=condition=Complete job/$MODEL_DOWNLOAD_JOB_NAME -n $NAMESPACE --timeout=6000s
-else
-    echo "Skipping model download (using existing model cache)..."
-    # Still create the PVC in case it doesn't exist
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
-fi
 # Deploy the specified configuration
-echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOY_TYPE configuration..."
+echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOYMENT configuration..."
 $DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE
-# Launch the benchmark job
+# Launch the benchmark job (if available)
-echo "Launching benchmark job..."
+if [[ "$PERF_AVAILABLE" == "true" ]]; then
-$DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE
+    echo "Launching benchmark job..."
+    $DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE
-# Construct job name from the perf file
-JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}')
+    # Construct job name from the perf file
-echo "Waiting for job '$JOB_NAME' to complete..."
+    JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}')
-$DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s
+    echo "Waiting for job '$JOB_NAME' to complete..."
+    $DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s
-# Print logs from the benchmark job
-echo "======================================"
+    # Print logs from the benchmark job
-echo "Benchmark completed. Logs:"
+    echo "======================================"
-echo "======================================"
+    echo "Benchmark completed. Logs:"
-$DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE
+    echo "======================================"
\ No newline at end of file
+    $DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE
+else
+    echo "======================================"
+    echo "Deployment completed successfully!"
+    echo "No performance benchmark available for this configuration."
+    echo "======================================"
+fi
\ No newline at end of file
--- a/tests/profiler/test_profile_sla_dryrun.py
+++ b/tests/profiler/test_profile_sla_dryrun.py
@@ -169,9 +169,7 @@ class TestProfileSLADryRun:
        class Args:
            def __init__(self):
                self.backend = "sglang"
-                self.config = (
+                self.config = "recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml"
-                    "recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml"
-                )
                self.output_dir = "/tmp/test_profiling_results"
                self.namespace = "test-namespace"
                self.min_num_gpus_per_engine = 8