docs: Clean up incomplete recipes and clarify Kubernetes-only focus (#4159)

Signed-off-by: Ben Hamm <ben.hamm@gmail.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com> Signed-off-by: atchernych <atchernych@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: tanmayv25 <tanmay2592@gmail.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: atchernych <atchernych@nvidia.com>

docs: Clean up incomplete recipes and clarify Kubernetes-only focus (#4159)
Signed-off-by: Ben Hamm <ben.hamm@gmail.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com> Signed-off-by: atchernych <atchernych@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: tanmayv25 <tanmay2592@gmail.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: atchernych <atchernych@nvidia.com>
88dfd1b3 · Ben Hamm · GitHub · 09bb1c68 · 88dfd1b3 · 88dfd1b3
Unverified Commit 88dfd1b3 authored Nov 17, 2025 by Ben Hamm Committed by GitHub Nov 18, 2025
7 changed files
--- a/examples/basics/multinode/trtllm/srun_disaggregated.sh
+++ b/examples/basics/multinode/trtllm/srun_disaggregated.sh
@@ -17,11 +17,11 @@ NUM_GPUS_PER_NODE=${NUM_GPUS_PER_NODE:-4}
 NUM_PREFILL_NODES=${NUM_PREFILL_NODES:-4}
 NUM_PREFILL_WORKERS=${NUM_PREFILL_WORKERS:-1}
-PREFILL_ENGINE_CONFIG="${PREFILL_ENGINE_CONFIG:-/mnt/recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml}"
+PREFILL_ENGINE_CONFIG="${PREFILL_ENGINE_CONFIG:-/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml}"
 NUM_DECODE_NODES=${NUM_DECODE_NODES:-4}
 NUM_DECODE_WORKERS=${NUM_DECODE_WORKERS:-1}
-DECODE_ENGINE_CONFIG="${DECODE_ENGINE_CONFIG:-/mnt/recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml}"
+DECODE_ENGINE_CONFIG="${DECODE_ENGINE_CONFIG:-/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml}"
 # Automate settings of certain variables for convenience, but you are free
 # to manually set these for more control as well.

--- a/recipes/README.md
+++ b/recipes/README.md
-# Dynamo Model Serving Recipes
+# Dynamo Production-Ready Recipes
-This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
+Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA Dynamo.
-## Contents
+> **Prerequisites:** This guide assumes you have already installed the Dynamo Kubernetes Platform.
- [Available Models](#available-models)
+> If not, follow the **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** first.
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- Deployment Methods
-   - [Option 1: Automated Deployment](#option-1-automated-deployment)
-   - [Option 2: Manual Deployment](#option-2-manual-deployment)
+## Available Recipes
-## Available Models
+| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes |GAIE integration |
+|-------|-----------|------|------|------------|------------------|-------|------------------|
-| Model Family    | Framework | Deployment Mode      | GPU Requirements | Status | Benchmark |GAIE-integration |
+| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization | ✅ | ❌ |
-|-----------------|-----------|---------------------|------------------|--------|-----------|------------------|
+| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation | ❌ |
-| llama-3-70b     | vllm      | agg                 | 4x H100/H200     | ✅     | ✅        |✅                |
+| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | ✅ | ✅ | 2 nodes, 8 GPUs each | ❌ |
-| llama-3-70b     | vllm      | disagg (1 node)      | 8x H100/H200    | ✅     | ✅        | 🚧               |
+| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | ✅ | ✅ | FP8 quantization | ❌ |
-| llama-3-70b     | vllm      | disagg (multi-node)     | 16x H100/H200    | ✅     | ✅        |🚧               |
+| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | ✅ | ✅ | Prefill + Decode separation | ❌ |
-| deepseek-r1     | sglang    | disagg (1 node, wide-ep)     | 8x H200          | ✅     | 🚧        |🚧               |
+| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | ✅ | ✅ | Blackwell only, WideEP | ❌ |
-| deepseek-r1     | sglang    | disagg (multi-node, wide-ep)     | 16x H200        | ✅     | 🚧        |🚧               |
+| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | ❌ |
-| gpt-oss-120b    | trtllm    | agg                 | 4x GB200         | ✅     | ✅        |🚧               |
+| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ |
+| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | ✅ | ❌ | Benchmark recipe pending | ❌ |
+| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | ✅ | ✅ |Multi-node: 8 decode + 1 prefill nodes | ❌ |
 **Legend:**
- ✅ Functional
+- **Deployment**: ✅ = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete
- 🚧 Under development
+- **Benchmark Recipe**: ✅ = Includes `perf.yaml` for running AIPerf benchmarks | ❌ = No benchmark recipe provided
+## Recipe Structure
+Each complete recipe follows this standard structure:
-**Recipe Directory Structure:**
+```
-Recipes are organized into a directory structure that follows the pattern:
-```text
 <model-name>/
+├── README.md (optional)           # Model-specific deployment notes
 ├── model-cache/
-│   ├── model-cache.yaml         # PVC for model cache
+│   ├── model-cache.yaml          # PersistentVolumeClaim for model storage
-│   └── model-download.yaml      # Job for model download
+│   └── model-download.yaml       # Job to download model from HuggingFace
-├── <framework>/
+└── <framework>/                  # vllm, sglang, or trtllm
-│   └── <deployment-mode>/
+    └── <deployment-mode>/        # agg, disagg, disagg-single-node, etc.
-│       ├── deploy.yaml          # DynamoGraphDeployment CRD and optional configmap for custom configuration
+        ├── deploy.yaml           # Complete DynamoGraphDeployment manifest
-│       └── perf.yaml (optional) # Performance benchmark
+        └── perf.yaml (optional)  # AIPerf benchmark job
-└── README.md (optional)         # Model documentation
 ```
 ## Quick Start
-Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment.
+### Prerequisites
-Choose your preferred deployment method: using the `run.sh` script or manual deployment steps.
-## Prerequisites
+**1. Dynamo Platform Installed**
-### 1. Environment Setup
+The recipes require the Dynamo Kubernetes Platform to be installed. Follow the installation guide:
-Create a Kubernetes namespace and set environment variable:
+- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Quickstart (~10 minutes)
+- **[Detailed Installation Guide](../docs/kubernetes/installation_guide.md)** - Advanced options
-```bash
+**2. GPU Cluster Requirements**
-export NAMESPACE=your-namespace
-kubectl create namespace ${NAMESPACE}
-```
-### 2. Deploy Dynamo Platform
+Ensure your cluster has:
+- GPU nodes matching recipe requirements (see table above)
-Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md).
-### 3. GPU Cluster
-Ensure your Kubernetes cluster has:
- GPU nodes with appropriate GPU types (see model requirements above)
 - GPU operator installed
- Sufficient GPU memory and compute resources
+- Appropriate GPU drivers and container runtime
-### 4. Container Registry Access
-Ensure access to NVIDIA container registry for runtime images:
+**3. HuggingFace Access**
- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z`
- `nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:x.y.z`
- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z`
-### 5. HuggingFace Access and Kubernetes Secret Creation
+Configure authentication to download models:
-Set up a kubernetes secret with the HuggingFace token for model download:
 ```bash
-# Update the token in the secret file
+export NAMESPACE=your-namespace
-vim hf_hub_secret/hf_hub_secret.yaml
+kubectl create namespace ${NAMESPACE}
-# Apply the secret
+# Create HuggingFace token secret
-kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN="your-token-here" \
+  -n ${NAMESPACE}
 ```
-6. Configure Storage Class
+**4. Storage Configuration**
+Update the `storageClassName` in `<model>/model-cache/model-cache.yaml` to match your cluster:
 ```bash
-# Check available storage classes
+# Find your storage class name
 kubectl get storageclass
-```
-Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
-```yaml
+# Edit the model-cache.yaml file and update:
-# In <model>/model-cache/model-cache.yaml
+# spec:
-spec:
+#   storageClassName: "your-actual-storage-class"
-  storageClassName: "your-actual-storage-class"  # Replace this
 ```
-## Option 1: Automated Deployment
+### Deploy a Recipe
-Use the `run.sh` script for fully automated deployment:
-**Note:** The script automatically:
- Create model cache PVC and downloads the model
- Deploy the model service
- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
+**Step 1: Download Model**
-#### Script Usage
 ```bash
-./run.sh [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>
+# Update storageClassName in model-cache.yaml first!
-```
+kubectl apply -f <model>/model-cache/ -n ${NAMESPACE}
-**Required Options:**
+# Wait for download to complete (may take 10-60 minutes depending on model size)
- `--model <model>`: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1)
+kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=6000s
- `--framework <framework>`: Backend framework (`vllm`, `trtllm`, `sglang`)
- `--deployment <deployment-type>`: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node)
-**Optional Options:**
+# Monitor progress
- `--namespace <namespace>`: Kubernetes namespace (default: dynamo)
+kubectl logs -f job/model-download -n ${NAMESPACE}
- `--dry-run`: Show commands without executing them
+```
- `-h, --help`: Show help message
-**Environment Variables:**
+**Step 2: Deploy Service**
- `NAMESPACE`: Kubernetes namespace (default: dynamo)
-#### Example Usage
 ```bash
-# Set up environment
+kubectl apply -f <model>/<framework>/<mode>/deploy.yaml -n ${NAMESPACE}
-export NAMESPACE=your-namespace
-kubectl create namespace ${NAMESPACE}
-# Configure HuggingFace token
-kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
-# use run.sh script to deploy the model
-# Deploy Llama-3-70B with vLLM (aggregated mode)
-./run.sh --model llama-3-70b --framework vllm --deployment agg
-# Deploy GPT-OSS-120B with TensorRT-LLM
+# Check deployment status
-./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
+kubectl get dynamographdeployment -n ${NAMESPACE}
-# Deploy DeepSeek-R1 with SGLang (disaggregated mode)
-./run.sh --model deepseek-r1 --framework sglang --deployment disagg
-# Deploy with custom namespace
+# Check pod status
-./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg
+kubectl get pods -n ${NAMESPACE}
-# Dry run to see what would be executed
+# Wait for pods to be ready
-./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg
+kubectl wait --for=condition=ready pod -l nvidia.com/dynamo-graph-deployment-name=<deployment-name> -n ${NAMESPACE} --timeout=600s
 ```
-## If deploying with Gateway API Inference extension GAIE
+**Step 3: Test Deployment**
-1. Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
+```bash
+# Port forward to access the service locally
+kubectl port-forward svc/<deployment-name>-frontend 8000:8000 -n ${NAMESPACE}
-2. Apply manifests by running a script.
+# In another terminal, test the endpoint
+curl http://localhost:8000/v1/models
-```bash
+# Send a test request
-# Match the block size to the cli value in your deployment file deploy.yaml: - "python3 -m dynamo.vllm ... --block-size 128"
+curl http://localhost:8000/v1/chat/completions \
-export DYNAMO_KV_BLOCK_SIZE=128
+  -H "Content-Type: application/json" \
-export EPP_IMAGE=nvcr.io/you/epp:tag
+  -d '{
-# Add --gaie argument to the script i.e.:
+    "model": "<model-name>",
-./run.sh --model llama-3-70b --framework vllm --gaie agg --deployment agg
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "max_tokens": 50
+  }'
 ```
-The script will perform gateway checks and apply the manifests.
-## Option 2: Manual Deployment
+**Step 4: Run Benchmark (Optional)**
-For step-by-step manual deployment follow these steps :
 ```bash
-# 0. Set up environment (see Prerequisites section)
+# Only if perf.yaml exists in the recipe directory
-export NAMESPACE=your-namespace
+kubectl apply -f <model>/<framework>/<mode>/perf.yaml -n ${NAMESPACE}
-kubectl create namespace ${NAMESPACE}
-kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
-# 1. Download model (see Model Download section)
+# Monitor benchmark progress
-kubectl apply -n $NAMESPACE -f <model>/model-cache/
+kubectl logs -f job/<benchmark-job-name> -n ${NAMESPACE}
-# 2. Deploy model (see Deployment section)
+# View results after completion
-kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/deploy.yaml
+kubectl logs job/<benchmark-job-name> -n ${NAMESPACE} | tail -50
-# 3. Run benchmarks (optional, if perf.yaml exists)
-kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml
 ```
-### Step 1: Download Model
+** Inference Gateway (GAIE) Integration (Optional)**
-```bash
+For Llama-3-70B with vLLM (Aggregated), an example of integration with the Inference Gateway is provided.
-# Start the download job
-kubectl apply -n $NAMESPACE -f <model>/model-cache
-# Verify job creation
+Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE. Then apply manifests.
-kubectl get jobs -n $NAMESPACE | grep model-download
-```
-Monitor and wait for the model download to complete:
 ```bash
+export DEPLOY_PATH=llama-3-70b/vllm/agg/
+#DEPLOY_PATH=<model>/<framework>/<mode>/
+kubectl apply -R -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
-# Wait for job completion (timeout after 100 minutes)
+## Example Deployments
-kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
-# Check job status
+### Llama-3-70B with vLLM (Aggregated)
-kubectl get job model-download -n $NAMESPACE
-# View download logs
-kubectl logs job/model-download -n $NAMESPACE
-```
-### Step 2: Deploy Model Service
 ```bash
-# Navigate to the specific deployment configuration
+export NAMESPACE=dynamo-demo
-cd <model>/<framework>/<deployment-mode>/
+kubectl create namespace ${NAMESPACE}
-# Deploy the model service
+# Create HF token secret
-kubectl apply -n $NAMESPACE -f deploy.yaml
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN="your-token" \
+  -n ${NAMESPACE}
-# Verify deployment creation
+# Deploy
-kubectl get deployments -n $NAMESPACE
+kubectl apply -f llama-3-70b/model-cache/ -n ${NAMESPACE}
+kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=6000s
+kubectl apply -f llama-3-70b/vllm/agg/deploy.yaml -n ${NAMESPACE}
+# Test
+kubectl port-forward svc/llama3-70b-agg-frontend 8000:8000 -n ${NAMESPACE}
 ```
-#### Wait for Deployment Ready
+### DeepSeek-R1 on GB200 (Multi-node)
-```bash
+See [deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml](deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml) for the complete multi-node WideEP configuration.
-# Get deployment name from the deploy.yaml file
-DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}')
-# Wait for deployment to be ready (timeout after 10 minutes)
+## Customization
-kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s
-# Check deployment status
+Each `deploy.yaml` contains:
-kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE
+- **ConfigMap**: Engine-specific configuration (embedded in the manifest)
+- **DynamoGraphDeployment**: Kubernetes resource definitions
+- **Resource limits**: GPU count, memory, CPU requests/limits
+- **Image references**: Container images with version tags
-# Check pod status
+### Key Customization Points
-kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME
-```
-#### Verify Model Service
+**Model Configuration:**
+```yaml
+# In deploy.yaml under worker args:
+args:
+  - python3 -m dynamo.vllm --model <your-model-path> --served-model-name <name>
+```
-```bash
+**GPU Resources:**
-# Check if service is running
+```yaml
-kubectl get services -n $NAMESPACE
+resources:
+  limits:
+    gpu: "4"  # Adjust based on your requirements
+  requests:
+    gpu: "4"
+```
-# Test model endpoint (port-forward to test locally)
+**Scaling:**
-kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE
+```yaml
+services:
+  VllmDecodeWorker:
+    replicas: 2  # Scale to multiple workers
+```
-# Test the model API (in another terminal)
+**Router Mode:**
-curl http://localhost:8000/v1/models
+```yaml
+# In Frontend args:
+args:
+  - python3 -m dynamo.frontend --router-mode kv --http-port 8000
+# Options: round-robin, kv (KV-aware routing)
+```
-# Stop port-forward when done
+**Container Images:**
-pkill -f "kubectl port-forward"
+```yaml
+image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z
+# Update version tag as needed
 ```
-### Step 3: Performance Benchmarking (Optional)
+## Troubleshooting
-Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
+### Common Issues
-#### Launch Benchmark Job
+**Pods stuck in Pending:**
+- Check GPU availability: `kubectl describe node <node-name>`
+- Verify storage class exists: `kubectl get storageclass`
+- Check resource requests vs. available resources
-```bash
+**Model download fails:**
-# From the deployment directory
+- Verify HuggingFace token is correct
-kubectl apply -n $NAMESPACE -f perf.yaml
+- Check network connectivity from cluster
+- Review job logs: `kubectl logs job/model-download -n ${NAMESPACE}`
-# Verify benchmark job creation
+**Workers fail to start:**
-kubectl get jobs -n $NAMESPACE
+- Check GPU compatibility (driver version, CUDA version)
-```
+- Verify image pull secrets if using private registries
+- Review pod logs: `kubectl logs <pod-name> -n ${NAMESPACE}`
-#### Monitor Benchmark Progress
+**For more troubleshooting:**
+- [Kubernetes Deployment Guide](../docs/kubernetes/README.md#troubleshooting)
+- [Observability Documentation](../docs/kubernetes/observability/)
-```bash
+## Related Documentation
-# Get benchmark job name
-PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}')
-# Monitor benchmark logs in real-time
+- **[Kubernetes Deployment Guide](../docs/kubernetes/README.md)** - Platform installation and concepts
-kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE
+- **[API Reference](../docs/kubernetes/api_reference.md)** - DynamoGraphDeployment CRD specification
+- **[vLLM Backend Guide](../docs/backends/vllm/README.md)** - vLLM-specific features
+- **[SGLang Backend Guide](../docs/backends/sglang/README.md)** - SGLang-specific features
+- **[TensorRT-LLM Backend Guide](../docs/backends/trtllm/README.md)** - TensorRT-LLM features
+- **[Observability](../docs/kubernetes/observability/)** - Monitoring and logging
+- **[Benchmarking Guide](../docs/benchmarks/benchmarking.md)** - Performance testing
-# Wait for benchmark completion (timeout after 100 minutes)
+## Contributing
-kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s
-```
-#### View Benchmark Results
+We welcome contributions of new recipes! See [CONTRIBUTING.md](CONTRIBUTING.md) for:
+- Recipe submission guidelines
+- Required components checklist
+- Testing and validation requirements
+- Documentation standards
-```bash
+### Recipe Quality Standards
-# Check final benchmark results
-kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50
+A production-ready recipe must include:
-```
+- ✅ Complete `deploy.yaml` with DynamoGraphDeployment
\ No newline at end of file
+- ✅ Model cache PVC and download job
+- ✅ Benchmark recipe (`perf.yaml`) for performance testing
+- ✅ Verification on target hardware
+- ✅ Documentation of GPU requirements
--- a/recipes/gpt-oss-120b/trtllm/disagg/README.md
+++ b/recipes/gpt-oss-120b/trtllm/disagg/README.md
+# GPT-OSS-120B Disaggregated Mode
+> **⚠️ INCOMPLETE**: This directory contains only engine configuration files and is not ready for Kubernetes deployment.
+## Current Status
+This directory contains TensorRT-LLM engine configurations for disaggregated serving:
+- `decode.yaml` - Decode worker engine configuration
+- `prefill.yaml` - Prefill worker engine configuration
+## Missing Components
+To complete this recipe, the following files are needed:
+- `deploy.yaml` - Kubernetes DynamoGraphDeployment manifest
+- `perf.yaml` - Performance benchmarking job (optional)
+## Alternative
+For a production-ready GPT-OSS-120B deployment, use the **aggregated mode**:
+- [gpt-oss-120b/trtllm/agg/](../agg/) - Complete with `deploy.yaml` and `perf.yaml`
+## Contributing
+If you'd like to complete this recipe, see [recipes/CONTRIBUTING.md](../../../CONTRIBUTING.md) for guidelines on creating proper Kubernetes deployment manifests.
--- a/recipes/run.sh
+++ b/recipes/run.sh
-#!/usr/bin/env bash
-# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-set -euo pipefail
-IFS=$'\n\t'
-RECIPES_DIR="$( cd "$( dirname "$0" )" && pwd )"
-# Default values
-NAMESPACE="${NAMESPACE:-dynamo}"
-DEPLOY_TYPE=""
-GAIE="${GAIE:-false}"
-DEPLOYMENT=""
-MODEL=""
-FRAMEWORK=""
-DRY_RUN=""
-# Frameworks - following container/build.sh pattern
-declare -A FRAMEWORKS=(["VLLM"]=1 ["TRTLLM"]=2 ["SGLANG"]=3)
-DEFAULT_FRAMEWORK=VLLM
-# Function to show usage
-usage() {
-    echo "Usage: $0 [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>"
-    echo ""
-    echo "Required Options:"
-    echo "  --model <model>       Model name (e.g., llama-3-70b)"
-    echo "  --framework <fw>      Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})"
-    echo "  --deployment <type>   Deployment type (e.g., agg, disagg etc, please refer to the README.md for available deployment types)"
-    echo ""
-    echo "Optional:"
-    echo "  --namespace <ns>   Kubernetes namespace (default: dynamo)"
-    echo "  --dry-run          Print commands without executing them"
-    echo "  --gaie[=true|false] Enable GAIE integration subfolder (applies GAIE manifests skips benchmark) (default: ${GAIE})"
-    echo "  -h, --help         Show this help message"
-    echo ""
-    echo "Environment Variables:"
-    echo "  NAMESPACE             Kubernetes namespace (default: dynamo)"
-    echo ""
-    echo "Examples:"
-    echo "  $0 --model llama-3-70b --framework vllm --deployment agg"
-    echo "  $0 --model llama-3-70b --framework trtllm --deployment disagg-single-node"
-    echo "  $0 --namespace my-ns --model llama-3-70b --framework vllm --deployment disagg-multi-node"
-    exit 1
-}
-missing_requirement() {
-    echo "ERROR: $1 requires an argument."
-    usage
-}
-error() {
-    printf '%s %s\n' "$1" "$2" >&2
-    exit 1
-}
-while [[ $# -gt 0 ]]; do
-    case $1 in
-        --dry-run)
-            DRY_RUN="echo"
-            shift
-            ;;
-        --model)
-            if [ "$2" ]; then
-                MODEL=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
-        --framework)
-            if [ "$2" ]; then
-                FRAMEWORK=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
-        --deployment)
-            if [ "$2" ]; then
-                DEPLOYMENT=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
-        --namespace)
-            if [ "$2" ]; then
-                NAMESPACE=$2
-                shift 2
-            else
-                missing_requirement "$1"
-            fi
-            ;;
-        --gaie)
-            GAIE=true
-            shift
-            ;;
-        --gaie=false)
-            GAIE=false
-            shift
-            ;;
-        --gaie=*)
-            GAIE="${1#*=}"
-            case "${GAIE,,}" in
-              true|false) GAIE="${GAIE,,}";;
-              *) echo "ERROR: --gaie must be true or false"; exit 1;;
-            esac
-            shift
-            ;;
-        -h|--help)
-            usage
-            ;;
-        -*)
-            error 'ERROR: Unknown option: ' "$1"
-            ;;
-        *)
-            error "ERROR: Unknown argument: " "$1"
-            ;;
-    esac
-done
-if [ -z "$FRAMEWORK" ]; then
-    FRAMEWORK=$DEFAULT_FRAMEWORK
-fi
-if [ -n "$FRAMEWORK" ]; then
-    FRAMEWORK=${FRAMEWORK^^}
-    if [[ -z "${FRAMEWORKS[$FRAMEWORK]}" ]]; then
-        error 'ERROR: Unknown framework: ' "$FRAMEWORK"
-    fi
-fi
-# Validate required arguments
-if [[ -z "$MODEL" ]] || [[ -z "$DEPLOYMENT" ]]; then
-    if [[ -z "$MODEL" ]]; then
-        echo "ERROR: --model argument is required"
-    fi
-    if [[ -z "$DEPLOYMENT" ]]; then
-        echo "ERROR: --deployment argument is required"
-    fi
-    echo ""
-    usage
-fi
-# Construct paths based on new structure: recipes/<model>/<framework>/<deployment-type>/
-MODEL_DIR="$RECIPES_DIR/$MODEL"
-FRAMEWORK_DIR="$MODEL_DIR/${FRAMEWORK,,}"
-DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOYMENT"
-INTEGRATION="$([[ "${GAIE,,}" == "true" ]] && echo gaie || echo "")"
-# Check if model directory exists
-if [[ ! -d "$MODEL_DIR" ]]; then
-    echo "Error: Model directory '$MODEL' does not exist in $RECIPES_DIR"
-    echo "Available models:"
-    ls -1 "$RECIPES_DIR" | grep -v "\.sh$\|\.md$\|model-cache$" | sed 's/^/  /'
-    exit 1
-fi
-# Check if framework directory exists
-if [[ ! -d "$FRAMEWORK_DIR" ]]; then
-    echo "Error: Framework directory '${FRAMEWORK,,}' does not exist in $MODEL_DIR"
-    echo "Available frameworks for $MODEL:"
-    ls -1 "$MODEL_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/  /'
-    exit 1
-fi
-# Check if deployment directory exists
-if [[ ! -d "$DEPLOY_PATH" ]]; then
-    echo "Error: Deployment type '$DEPLOYMENT' does not exist in $FRAMEWORK_DIR"
-    echo "Available deployment types for $MODEL/${FRAMEWORK,,}:"
-    ls -1 "$FRAMEWORK_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/  /'
-    exit 1
-fi
-# Check if deployment files exist
-DEPLOY_FILE="$DEPLOY_PATH/deploy.yaml"
-PERF_FILE="$DEPLOY_PATH/perf.yaml"
-if [[ ! -f "$DEPLOY_FILE" ]]; then
-    echo "Error: Deployment file '$DEPLOY_FILE' not found"
-    exit 1
-fi
-# Check if perf file exists (optional)
-PERF_AVAILABLE=false
-if [[ -f "$PERF_FILE" ]]; then
-    PERF_AVAILABLE=true
-    echo "Performance benchmark file found: $PERF_FILE"
-else
-    echo "Performance benchmark file not found: $PERF_FILE (skipping benchmarks)"
-fi
-# Show deployment information
-echo "======================================"
-echo "Dynamo Recipe Deployment"
-echo "======================================"
-echo "Model: $MODEL"
-echo "Framework: ${FRAMEWORK,,}"
-echo "Deployment Type: $DEPLOYMENT"
-echo "Namespace: $NAMESPACE"
-echo "GAIE integration: $GAIE"
-echo "======================================"
-# Handle model downloading
-MODEL_CACHE_DIR="$MODEL_DIR/model-cache"
-echo "Creating PVC for model cache and downloading model..."
-$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
-$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
-# Wait for the model download to complete
-MODEL_DOWNLOAD_JOB_NAME=$(grep "name:" $MODEL_CACHE_DIR/model-download.yaml | head -1 | awk '{print $2}')
-echo "Waiting for job '$MODEL_DOWNLOAD_JOB_NAME' to complete..."
-$DRY_RUN kubectl wait --for=condition=Complete job/$MODEL_DOWNLOAD_JOB_NAME -n $NAMESPACE --timeout=6000s
-# Deploy the specified configuration
-echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOYMENT configuration..."
-$DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE
-if [[ "$INTEGRATION" == "gaie" ]]; then
-    # run gaie checks.
-    SCRIPT_DIR="$(cd -- "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-    "${SCRIPT_DIR}/gaie_checks.sh"
-    $DRY_RUN kubectl apply -R -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
-    # For now do not run the benchmark
-    exit
- fi
-# Launch the benchmark job (if available)
-if [[ "$PERF_AVAILABLE" == "true" ]]; then
-    echo "Launching benchmark job..."
-    $DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE
-    # Construct job name from the perf file
-    JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}')
-    echo "Waiting for job '$JOB_NAME' to complete..."
-    $DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s
-    # Print logs from the benchmark job
-    echo "======================================"
-    echo "Benchmark completed. Logs:"
-    echo "======================================"
-    $DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE
-else
-    echo "======================================"
-    echo "Deployment completed successfully!"
-    echo "No performance benchmark available for this configuration."
-    echo "======================================"
-fi
\ No newline at end of file
--- a/tests/serve/configs/trtllm/agg.yaml
+++ b/tests/serve/configs/trtllm/agg.yaml
--- a/tests/serve/configs/trtllm/decode.yaml
+++ b/tests/serve/configs/trtllm/decode.yaml
--- a/tests/serve/configs/trtllm/prefill.yaml
+++ b/tests/serve/configs/trtllm/prefill.yaml