This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
6. (Optional) Create a shared model cache pvc to store the model weights.
Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
### 6. Configure Storage Class
Configure persistent storage for model caching:
```bash
# Check available storage classes
kubectl get storageclass
```
## Running the recipes
Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
```yaml
# In <model>/model-cache/model-cache.yaml
spec:
storageClassName:"your-actual-storage-class"# Replace this
```
## Option 1: Automated Deployment
Use the `run.sh` script for fully automated deployment:
**Note:** The script automatically:
- Create model cache PVC and downloads the model
- Deploy the model service
- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is based on the WideEP recipe from the SGLang team.
## Container
Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
...
...
@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe
Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
# Hardware
## Hardware
The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.