Unverified Commit 0757ecf1 authored by Biswa Panda's avatar Biswa Panda Committed by GitHub
Browse files

fix: consistent model recipes and update simplified doc (#3858)

parent c2dc8557
...@@ -31,7 +31,7 @@ spec: ...@@ -31,7 +31,7 @@ spec:
command: ["python", "-m", "benchmarks.profiler.profile_sla"] command: ["python", "-m", "benchmarks.profiler.profile_sla"]
args: args:
- --config - --config
- /sgl-workspace/dynamo/recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml - /sgl-workspace/dynamo/recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml
- --output-dir - --output-dir
- /data/profiling_results - /data/profiling_results
- --namespace - --namespace
......
# Recipes Contributing Guide
When adding new model recipes, ensure they follow the standard structure:
```text
<model-name>/
├── model-cache/
│ ├── model-cache.yaml
│ └── model-download.yaml
├── <framework>/
│ └── <deployment-mode>/
│ ├── deploy.yaml
│ └── perf.yaml (optional)
└── README.md (optional)
```
## Validation
The `run.sh` script expects this exact directory structure and will validate that the directories and files exist before deployment:
- Model directory exists in `recipes/<model>/`
- Framework is one of the supported frameworks (vllm, sglang, trtllm)
- Framework directory exists in `recipes/<model>/<framework>/`
- Deployment directory exists in `recipes/<model>/<framework>/<deployment>/`
- Required files (`deploy.yaml`) exist in the deployment directory
- If present, performance benchmarks (`perf.yaml`) will be automatically executed
\ No newline at end of file
# Dynamo model serving recipes # Dynamo Model Serving Recipes
| Model family | Backend | Mode | GPU | Deployment | Benchmark | This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
|---------------|---------|---------------------|-------|------------|-----------|
| llama-3-70b | vllm | agg | H100, H200 | ✓ | ✓ | ## Contents
| llama-3-70b | vllm | disagg-multi-node | H100, H200 | ✓ | ✓ | - [Available Models](#available-models)
| llama-3-70b | vllm | disagg-single-node | H100, H200 | ✓ | ✓ | - [Quick Start](#quick-start)
| DeepSeek-R1 | sglang | disaggregated | H200 | ✓ | 🚧 | - [Prerequisites](#prerequisites)
| oss-gpt | trtllm | aggregated | GB200 | ✓ | ✓ | - Deployment Methods
- [Option 1: Automated Deployment](#option-1-automated-deployment)
- [Option 2: Manual Deployment](#option-2-manual-deployment)
## Available Models
| Model Family | Framework | Deployment Mode | GPU Requirements | Status | Benchmark |
|-----------------|-----------|---------------------|------------------|--------|-----------|
| llama-3-70b | vllm | agg | 4x H100/H200 | ✅ | ✅ |
| llama-3-70b | vllm | disagg (1 node) | 8x H100/H200 | ✅ | ✅ |
| llama-3-70b | vllm | disagg (multi-node) | 16x H100/H200 | ✅ | ✅ |
| deepseek-r1 | sglang | disagg (1 node, wide-ep) | 8x H200 | ✅ | 🚧 |
| deepseek-r1 | sglang | disagg (multi-node, wide-ep) | 16x H200 | ✅ | 🚧 |
| gpt-oss-120b | trtllm | agg | 4x GB200 | ✅ | ✅ |
**Legend:**
- ✅ Functional
- 🚧 Under development
**Recipe Directory Structure:**
Recipes are organized into a directory structure that follows the pattern:
```text
<model-name>/
├── model-cache/
│ ├── model-cache.yaml # PVC for model cache
│ └── model-download.yaml # Job for model download
├── <framework>/
│ └── <deployment-mode>/
│ ├── deploy.yaml # DynamoGraphDeployment CRD and optional configmap for custom configuration
│ └── perf.yaml (optional) # Performance benchmark
└── README.md (optional) # Model documentation
```
## Quick Start
Follow the instructions in the [Prerequisites](#prerequisites) section to set up your environment.
Choose your preferred deployment method: using the `run.sh` script or manual deployment steps.
## Prerequisites ## Prerequisites
1. Create a namespace and populate NAMESPACE environment variable ### 1. Environment Setup
This environment variable is used in later steps to deploy and perf-test the model.
Create a Kubernetes namespace and set environment variable:
```bash ```bash
export NAMESPACE=your-namespace export NAMESPACE=your-namespace
kubectl create namespace ${NAMESPACE} kubectl create namespace ${NAMESPACE}
``` ```
2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md) ### 2. Deploy Dynamo Platform
Install the Dynamo Cloud Platform following the [Quickstart Guide](../docs/kubernetes/README.md).
### 3. GPU Cluster
Ensure your Kubernetes cluster has:
- GPU nodes with appropriate GPU types (see model requirements above)
- GPU operator installed
- Sufficient GPU memory and compute resources
### 4. Container Registry Access
3. **Kubernetes cluster with GPU support** Ensure access to NVIDIA container registry for runtime images:
- `nvcr.io/nvidia/ai-dynamo/vllm-runtime:x.y.z`
- `nvcr.io/nvidia/ai-dynamo/trtllm-runtime:x.y.z`
- `nvcr.io/nvidia/ai-dynamo/sglang-runtime:x.y.z`
4. **Container registry access** for vLLM runtime images ### 5. HuggingFace Access and Kubernetes Secret Creation
5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) Set up a kubernetes secret with the HuggingFace token for model download:
Update the `hf-token-secret.yaml` file with your HuggingFace token.
```bash ```bash
# Update the token in the secret file
vim hf_hub_secret/hf_hub_secret.yaml
# Apply the secret
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
``` ```
6. (Optional) Create a shared model cache pvc to store the model weights. ### 6. Configure Storage Class
Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
Configure persistent storage for model caching:
```bash ```bash
# Check available storage classes
kubectl get storageclass kubectl get storageclass
``` ```
## Running the recipes Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
```yaml
# In <model>/model-cache/model-cache.yaml
spec:
storageClassName: "your-actual-storage-class" # Replace this
```
## Option 1: Automated Deployment
Run the recipe to deploy a model: Use the `run.sh` script for fully automated deployment:
**Note:** The script automatically:
- Create model cache PVC and downloads the model
- Deploy the model service
- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
#### Script Usage
```bash ```bash
./run.sh --model <model> --framework <framework> <deployment-type> ./run.sh [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>
``` ```
Arguments: **Required Options:**
<deployment-type> Deployment type (e.g., agg, disagg-single-node, disagg-multi-node) - `--model <model>`: Model name matching the directory name in the recipes directory (e.g., llama-3-70b, gpt-oss-120b, deepseek-r1)
- `--framework <framework>`: Backend framework (`vllm`, `trtllm`, `sglang`)
- `--deployment <deployment-type>`: Deployment mode (e.g., agg, disagg, disagg-single-node, disagg-multi-node)
**Optional Options:**
- `--namespace <namespace>`: Kubernetes namespace (default: dynamo)
- `--dry-run`: Show commands without executing them
- `-h, --help`: Show help message
**Environment Variables:**
- `NAMESPACE`: Kubernetes namespace (default: dynamo)
#### Example Usage
```bash
# Set up environment
export NAMESPACE=your-namespace
kubectl create namespace ${NAMESPACE}
# Configure HuggingFace token
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
# use run.sh script to deploy the model
# Deploy Llama-3-70B with vLLM (aggregated mode)
./run.sh --model llama-3-70b --framework vllm --deployment agg
# Deploy GPT-OSS-120B with TensorRT-LLM
./run.sh --model gpt-oss-120b --framework trtllm --deployment agg
# Deploy DeepSeek-R1 with SGLang (disaggregated mode)
./run.sh --model deepseek-r1 --framework sglang --deployment disagg
# Deploy with custom namespace
./run.sh --namespace my-namespace --model llama-3-70b --framework vllm --deployment agg
# Dry run to see what would be executed
./run.sh --dry-run --model llama-3-70b --framework vllm --deployment agg
```
Required Options:
--model <model> Model name (e.g., llama-3-70b)
--framework <fw> Framework one of VLLM TRTLLM SGLANG (default: VLLM)
Optional: ## Option 2: Manual Deployment
--skip-model-cache Skip model downloading (assumes model cache already exists)
-h, --help Show this help message
Environment Variables: For step-by-step manual deployment follow these steps :
NAMESPACE Kubernetes namespace (default: dynamo)
Examples:
./run.sh --model llama-3-70b --framework vllm agg
./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg
./run.sh --model llama-3-70b --framework trtllm disagg-single-node
Example:
```bash ```bash
./run.sh --model llama-3-70b --framework vllm --deployment-type agg # 0. Set up environment (see Prerequisites section)
export NAMESPACE=your-namespace
kubectl create namespace ${NAMESPACE}
kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE}
# 1. Download model (see Model Download section)
kubectl apply -n $NAMESPACE -f <model>/model-cache/
# 2. Deploy model (see Deployment section)
kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/deploy.yaml
# 3. Run benchmarks (optional, if perf.yaml exists)
kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml
```
### Step 1: Download Model
```bash
# Start the download job
kubectl apply -n $NAMESPACE -f <model>/model-cache
# Verify job creation
kubectl get jobs -n $NAMESPACE | grep model-download
```
Monitor and wait for the model download to complete:
```bash
# Wait for job completion (timeout after 100 minutes)
kubectl wait --for=condition=Complete job/model-download -n $NAMESPACE --timeout=6000s
# Check job status
kubectl get job model-download -n $NAMESPACE
# View download logs
kubectl logs job/model-download -n $NAMESPACE
```
### Step 2: Deploy Model Service
```bash
# Navigate to the specific deployment configuration
cd <model>/<framework>/<deployment-mode>/
# Deploy the model service
kubectl apply -n $NAMESPACE -f deploy.yaml
# Verify deployment creation
kubectl get deployments -n $NAMESPACE
``` ```
#### Wait for Deployment Ready
## Dry run mode ```bash
# Get deployment name from the deploy.yaml file
DEPLOYMENT_NAME=$(grep "name:" deploy.yaml | head -1 | awk '{print $2}')
# Wait for deployment to be ready (timeout after 10 minutes)
kubectl wait --for=condition=available deployment/$DEPLOYMENT_NAME -n $NAMESPACE --timeout=1200s
# Check deployment status
kubectl get deployment $DEPLOYMENT_NAME -n $NAMESPACE
# Check pod status
kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT_NAME
```
#### Verify Model Service
```bash
# Check if service is running
kubectl get services -n $NAMESPACE
# Test model endpoint (port-forward to test locally)
kubectl port-forward service/${DEPLOYMENT_NAME}-frontend 8000:8000 -n $NAMESPACE
# Test the model API (in another terminal)
curl http://localhost:8000/v1/models
To dry run the recipe, add the `--dry-run` flag. # Stop port-forward when done
pkill -f "kubectl port-forward"
```
### Step 3: Performance Benchmarking (Optional)
Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
#### Launch Benchmark Job
```bash ```bash
./run.sh --dry-run --model llama-3-70b --framework vllm agg # From the deployment directory
kubectl apply -n $NAMESPACE -f perf.yaml
# Verify benchmark job creation
kubectl get jobs -n $NAMESPACE
``` ```
## (Optional) Running the recipes with model cache #### Monitor Benchmark Progress
You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights.
See the [Prerequisites](#prerequisites) section for more details.
```bash ```bash
./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache # Get benchmark job name
PERF_JOB_NAME=$(grep "name:" perf.yaml | head -1 | awk '{print $2}')
# Monitor benchmark logs in real-time
kubectl logs -f job/$PERF_JOB_NAME -n $NAMESPACE
# Wait for benchmark completion (timeout after 100 minutes)
kubectl wait --for=condition=Complete job/$PERF_JOB_NAME -n $NAMESPACE --timeout=6000s
``` ```
#### View Benchmark Results
```bash
# Check final benchmark results
kubectl logs job/$PERF_JOB_NAME -n $NAMESPACE | tail -50
```
\ No newline at end of file
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: batch/v1
kind: Job
metadata:
name: model-download
spec:
backoffLimit: 3
completions: 1
parallelism: 1
template:
metadata:
labels:
app: model-download
spec:
restartPolicy: Never
containers:
- name: model-download
image: python:3.10-slim
command: ["sh", "-c"]
envFrom:
- secretRef:
name: hf-token-secret
env:
- name: MODEL_NAME
value: deepseek-ai/DeepSeek-R1
- name: HF_HOME
value: /model-store
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "1"
- name: MODEL_REVISION
value: 56d4cbbb4d29f4355bab4b9a39ccb717a14ad5ad
args:
- |
set -eux
pip install --no-cache-dir huggingface_hub hf_transfer
hf download $MODEL_NAME --revision $MODEL_REVISION
volumeMounts:
- name: model-cache
mountPath: /model-store
volumes:
- name: model-cache
persistentVolumeClaim:
claimName: model-cache
\ No newline at end of file
# Container # DeepSeek R1 SGLang Recipe
This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is based on the WideEP recipe from the SGLang team.
## Container
Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
...@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe ...@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe
Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required. Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
# Hardware ## Hardware
The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity. The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.
......
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: llm-config
data:
config.yaml: |
enable_attention_dp: true
cuda_graph_config:
max_batch_size: 800
enable_padding: true
kv_cache_config:
enable_block_reuse: false
stream_interval: 20
moe_config:
backend: CUTLASS
\ No newline at end of file
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0 # SPDX-License-Identifier: Apache-2.0
apiVersion: v1
kind: ConfigMap
metadata:
name: llm-config
data:
config.yaml: |
enable_attention_dp: true
cuda_graph_config:
max_batch_size: 800
enable_padding: true
kv_cache_config:
enable_block_reuse: false
stream_interval: 20
moe_config:
backend: CUTLASS
---
apiVersion: nvidia.com/v1alpha1 apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment kind: DynamoGraphDeployment
metadata: metadata:
...@@ -7,7 +23,7 @@ metadata: ...@@ -7,7 +23,7 @@ metadata:
spec: spec:
backendFramework: trtllm backendFramework: trtllm
pvcs: pvcs:
- name: model-cache-oss-gpt120b - name: model-cache
create: false create: false
services: services:
Frontend: Frontend:
...@@ -31,17 +47,13 @@ spec: ...@@ -31,17 +47,13 @@ spec:
- /bin/sh - /bin/sh
- -c - -c
image: my-registry/trtllm-runtime:my-tag image: my-registry/trtllm-runtime:my-tag
pvc:
create: false
mountPoint: /model-store
name: model-cache
replicas: 1 replicas: 1
TrtllmWorker: TrtllmWorker:
componentType: main componentType: main
dynamoNamespace: gpt-oss-agg dynamoNamespace: gpt-oss-agg
envFromSecret: hf-token-secret envFromSecret: hf-token-secret
volumeMounts: volumeMounts:
- name: model-cache-oss-gpt120b - name: model-cache
mountPoint: /root/.cache/huggingface mountPoint: /root/.cache/huggingface
sharedMemory: sharedMemory:
size: 80Gi size: 80Gi
...@@ -90,10 +102,6 @@ spec: ...@@ -90,10 +102,6 @@ spec:
- configMap: - configMap:
name: llm-config name: llm-config
name: llm-config name: llm-config
pvc:
create: false
mountPoint: /model-store
name: model-cache
replicas: 1 replicas: 1
resources: resources:
limits: limits:
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
apiVersion: batch/v1 apiVersion: batch/v1
kind: Job kind: Job
metadata: metadata:
name: oss-gpt120b-bench name: gpt-oss-120b-bench
spec: spec:
backoffLimit: 1 backoffLimit: 1
completions: 1 completions: 1
...@@ -11,7 +11,7 @@ spec: ...@@ -11,7 +11,7 @@ spec:
template: template:
metadata: metadata:
labels: labels:
app: oss-gpt120b-bench app: gpt-oss-120b-bench
spec: spec:
affinity: affinity:
podAntiAffinity: podAntiAffinity:
......
...@@ -17,8 +17,7 @@ ...@@ -17,8 +17,7 @@
RECIPES_DIR="$( cd "$( dirname "$0" )" && pwd )" RECIPES_DIR="$( cd "$( dirname "$0" )" && pwd )"
# Default values # Default values
NAMESPACE="${NAMESPACE:-dynamo}" NAMESPACE="${NAMESPACE:-dynamo}"
DOWNLOAD_MODEL=true DEPLOYMENT=""
DEPLOY_TYPE=""
MODEL="" MODEL=""
FRAMEWORK="" FRAMEWORK=""
DRY_RUN="" DRY_RUN=""
...@@ -29,28 +28,25 @@ DEFAULT_FRAMEWORK=VLLM ...@@ -29,28 +28,25 @@ DEFAULT_FRAMEWORK=VLLM
# Function to show usage # Function to show usage
usage() { usage() {
echo "Usage: $0 [OPTIONS] --model <model> --framework <framework> <deployment-type>" echo "Usage: $0 [OPTIONS] --model <model> --framework <framework> --deployment <deployment-type>"
echo ""
echo "Arguments:"
echo " <deployment-type> Deployment type (e.g., agg, disagg-single-node, disagg-multi-node)"
echo "" echo ""
echo "Required Options:" echo "Required Options:"
echo " --model <model> Model name (e.g., llama-3-70b)" echo " --model <model> Model name (e.g., llama-3-70b)"
echo " --framework <fw> Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})" echo " --framework <fw> Framework one of ${!FRAMEWORKS[*]} (default: ${DEFAULT_FRAMEWORK})"
echo " --deployment <type> Deployment type (e.g., agg, disagg etc, please refer to the README.md for available deployment types)"
echo "" echo ""
echo "Optional:" echo "Optional:"
echo " --namespace <ns> Kubernetes namespace (default: dynamo)" echo " --namespace <ns> Kubernetes namespace (default: dynamo)"
echo " --skip-model-cache Skip model downloading (assumes model cache already exists)" echo " --dry-run Print commands without executing them"
echo " --dry-run Print commands without executing them" echo " -h, --help Show this help message"
echo " -h, --help Show this help message"
echo "" echo ""
echo "Environment Variables:" echo "Environment Variables:"
echo " NAMESPACE Kubernetes namespace (default: dynamo)" echo " NAMESPACE Kubernetes namespace (default: dynamo)"
echo "" echo ""
echo "Examples:" echo "Examples:"
echo " $0 --model llama-3-70b --framework vllm agg" echo " $0 --model llama-3-70b --framework vllm --deployment agg"
echo " $0 --skip-model-cache --model llama-3-70b --framework vllm agg" echo " $0 --model llama-3-70b --framework trtllm --deployment disagg-single-node"
echo " $0 --namespace my-ns --model llama-3-70b --framework trtllm disagg-single-node" echo " $0 --namespace my-ns --model llama-3-70b --framework vllm --deployment disagg-multi-node"
exit 1 exit 1
} }
...@@ -66,10 +62,6 @@ error() { ...@@ -66,10 +62,6 @@ error() {
while [[ $# -gt 0 ]]; do while [[ $# -gt 0 ]]; do
case $1 in case $1 in
--skip-model-cache)
DOWNLOAD_MODEL=false
shift
;;
--dry-run) --dry-run)
DRY_RUN="echo" DRY_RUN="echo"
shift shift
...@@ -90,6 +82,14 @@ while [[ $# -gt 0 ]]; do ...@@ -90,6 +82,14 @@ while [[ $# -gt 0 ]]; do
missing_requirement "$1" missing_requirement "$1"
fi fi
;; ;;
--deployment)
if [ "$2" ]; then
DEPLOYMENT=$2
shift 2
else
missing_requirement "$1"
fi
;;
--namespace) --namespace)
if [ "$2" ]; then if [ "$2" ]; then
NAMESPACE=$2 NAMESPACE=$2
...@@ -105,12 +105,7 @@ while [[ $# -gt 0 ]]; do ...@@ -105,12 +105,7 @@ while [[ $# -gt 0 ]]; do
error 'ERROR: Unknown option: ' "$1" error 'ERROR: Unknown option: ' "$1"
;; ;;
*) *)
if [[ -z "$DEPLOY_TYPE" ]]; then error "ERROR: Unknown argument: " "$1"
DEPLOY_TYPE="$1"
else
error "ERROR: Multiple deployment type arguments provided: " "$1"
fi
shift
;; ;;
esac esac
done done
...@@ -127,12 +122,12 @@ if [ -n "$FRAMEWORK" ]; then ...@@ -127,12 +122,12 @@ if [ -n "$FRAMEWORK" ]; then
fi fi
# Validate required arguments # Validate required arguments
if [[ -z "$MODEL" ]] || [[ -z "$DEPLOY_TYPE" ]]; then if [[ -z "$MODEL" ]] || [[ -z "$DEPLOYMENT" ]]; then
if [[ -z "$MODEL" ]]; then if [[ -z "$MODEL" ]]; then
echo "ERROR: --model argument is required" echo "ERROR: --model argument is required"
fi fi
if [[ -z "$DEPLOY_TYPE" ]]; then if [[ -z "$DEPLOYMENT" ]]; then
echo "ERROR: deployment-type argument is required" echo "ERROR: --deployment argument is required"
fi fi
echo "" echo ""
usage usage
...@@ -141,7 +136,7 @@ fi ...@@ -141,7 +136,7 @@ fi
# Construct paths based on new structure: recipes/<model>/<framework>/<deployment-type>/ # Construct paths based on new structure: recipes/<model>/<framework>/<deployment-type>/
MODEL_DIR="$RECIPES_DIR/$MODEL" MODEL_DIR="$RECIPES_DIR/$MODEL"
FRAMEWORK_DIR="$MODEL_DIR/${FRAMEWORK,,}" FRAMEWORK_DIR="$MODEL_DIR/${FRAMEWORK,,}"
DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOY_TYPE" DEPLOY_PATH="$FRAMEWORK_DIR/$DEPLOYMENT"
# Check if model directory exists # Check if model directory exists
if [[ ! -d "$MODEL_DIR" ]]; then if [[ ! -d "$MODEL_DIR" ]]; then
...@@ -161,7 +156,7 @@ fi ...@@ -161,7 +156,7 @@ fi
# Check if deployment directory exists # Check if deployment directory exists
if [[ ! -d "$DEPLOY_PATH" ]]; then if [[ ! -d "$DEPLOY_PATH" ]]; then
echo "Error: Deployment type '$DEPLOY_TYPE' does not exist in $FRAMEWORK_DIR" echo "Error: Deployment type '$DEPLOYMENT' does not exist in $FRAMEWORK_DIR"
echo "Available deployment types for $MODEL/${FRAMEWORK,,}:" echo "Available deployment types for $MODEL/${FRAMEWORK,,}:"
ls -1 "$FRAMEWORK_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/ /' ls -1 "$FRAMEWORK_DIR" | grep -v "\.sh$\|\.md$" | sed 's/^/ /'
exit 1 exit 1
...@@ -176,9 +171,13 @@ if [[ ! -f "$DEPLOY_FILE" ]]; then ...@@ -176,9 +171,13 @@ if [[ ! -f "$DEPLOY_FILE" ]]; then
exit 1 exit 1
fi fi
if [[ ! -f "$PERF_FILE" ]]; then # Check if perf file exists (optional)
echo "Error: Performance file '$PERF_FILE' not found" PERF_AVAILABLE=false
exit 1 if [[ -f "$PERF_FILE" ]]; then
PERF_AVAILABLE=true
echo "Performance benchmark file found: $PERF_FILE"
else
echo "Performance benchmark file not found: $PERF_FILE (skipping benchmarks)"
fi fi
# Show deployment information # Show deployment information
...@@ -187,42 +186,43 @@ echo "Dynamo Recipe Deployment" ...@@ -187,42 +186,43 @@ echo "Dynamo Recipe Deployment"
echo "======================================" echo "======================================"
echo "Model: $MODEL" echo "Model: $MODEL"
echo "Framework: ${FRAMEWORK,,}" echo "Framework: ${FRAMEWORK,,}"
echo "Deployment Type: $DEPLOY_TYPE" echo "Deployment Type: $DEPLOYMENT"
echo "Namespace: $NAMESPACE" echo "Namespace: $NAMESPACE"
echo "Model Download: $DOWNLOAD_MODEL"
echo "======================================" echo "======================================"
# Handle model downloading # Handle model downloading
MODEL_CACHE_DIR="$MODEL_DIR/model-cache" MODEL_CACHE_DIR="$MODEL_DIR/model-cache"
if [[ "$DOWNLOAD_MODEL" == "true" ]]; then echo "Creating PVC for model cache and downloading model..."
echo "Creating PVC for model cache and downloading model..." $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml $DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-download.yaml
# Wait for the model download to complete
# Wait for the model download to complete MODEL_DOWNLOAD_JOB_NAME=$(grep "name:" $MODEL_CACHE_DIR/model-download.yaml | head -1 | awk '{print $2}')
echo "Waiting for the model download to complete..." echo "Waiting for job '$MODEL_DOWNLOAD_JOB_NAME' to complete..."
$DRY_RUN kubectl wait --for=condition=Complete job/model-download-${MODEL} -n $NAMESPACE --timeout=6000s $DRY_RUN kubectl wait --for=condition=Complete job/$MODEL_DOWNLOAD_JOB_NAME -n $NAMESPACE --timeout=6000s
else
echo "Skipping model download (using existing model cache)..."
# Still create the PVC in case it doesn't exist
$DRY_RUN kubectl apply -n $NAMESPACE -f $MODEL_CACHE_DIR/model-cache.yaml
fi
# Deploy the specified configuration # Deploy the specified configuration
echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOY_TYPE configuration..." echo "Deploying $MODEL ${FRAMEWORK,,} $DEPLOYMENT configuration..."
$DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE $DRY_RUN kubectl apply -n $NAMESPACE -f $DEPLOY_FILE
# Launch the benchmark job # Launch the benchmark job (if available)
echo "Launching benchmark job..." if [[ "$PERF_AVAILABLE" == "true" ]]; then
$DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE echo "Launching benchmark job..."
$DRY_RUN kubectl apply -n $NAMESPACE -f $PERF_FILE
# Construct job name from the perf file
JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}') # Construct job name from the perf file
echo "Waiting for job '$JOB_NAME' to complete..." JOB_NAME=$(grep "name:" $PERF_FILE | head -1 | awk '{print $2}')
$DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s echo "Waiting for job '$JOB_NAME' to complete..."
$DRY_RUN kubectl wait --for=condition=Complete job/$JOB_NAME -n $NAMESPACE --timeout=6000s
# Print logs from the benchmark job
echo "======================================" # Print logs from the benchmark job
echo "Benchmark completed. Logs:" echo "======================================"
echo "======================================" echo "Benchmark completed. Logs:"
$DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE echo "======================================"
\ No newline at end of file $DRY_RUN kubectl logs job/$JOB_NAME -n $NAMESPACE
else
echo "======================================"
echo "Deployment completed successfully!"
echo "No performance benchmark available for this configuration."
echo "======================================"
fi
\ No newline at end of file
...@@ -169,9 +169,7 @@ class TestProfileSLADryRun: ...@@ -169,9 +169,7 @@ class TestProfileSLADryRun:
class Args: class Args:
def __init__(self): def __init__(self):
self.backend = "sglang" self.backend = "sglang"
self.config = ( self.config = "recipes/deepseek-r1/sglang/disagg-16gpu/deploy.yaml"
"recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml"
)
self.output_dir = "/tmp/test_profiling_results" self.output_dir = "/tmp/test_profiling_results"
self.namespace = "test-namespace" self.namespace = "test-namespace"
self.min_num_gpus_per_engine = 8 self.min_num_gpus_per_engine = 8
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment