| Model family | Backend | Mode | GPU | Deployment | Benchmark |
This repository contains production-ready recipes for deploying large language models using the Dynamo platform. Each recipe includes deployment configurations, performance benchmarking, and model caching setup.
6. (Optional) Create a shared model cache pvc to store the model weights.
### 6. Configure Storage Class
Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file.
Configure persistent storage for model caching:
```bash
```bash
# Check available storage classes
kubectl get storageclass
kubectl get storageclass
```
```
## Running the recipes
Replace "your-storage-class-name" with your actual storage class in the file: `<model>/model-cache/model-cache.yaml`
```yaml
# In <model>/model-cache/model-cache.yaml
spec:
storageClassName:"your-actual-storage-class"# Replace this
```
## Option 1: Automated Deployment
Run the recipe to deploy a model:
Use the `run.sh` script for fully automated deployment:
**Note:** The script automatically:
- Create model cache PVC and downloads the model
- Deploy the model service
- Runs performance benchmark if a `perf.yaml` file is present in the deployment directory
Run performance benchmarks to evaluate model performance. Note that benchmarking is only available for models that include a `perf.yaml` file (optional):
This recipe is for running DeepSeek R1 with SGLang in disaggregated mode. It is based on the WideEP recipe from the SGLang team.
## Container
Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the container, or
...
@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe
...
@@ -8,7 +12,7 @@ Use the Dockerfile in `container/Dockerfile.sglang-wideep` to build the containe
Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
Dynamo commits after `1b3eed4b6a0e735d4ecec6681f4c0b89f2112167` (Sep 18, 2025) are required.
# Hardware
## Hardware
The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.
The two deployment recipes are for 8xH200 and 16xH200. It should also work for other GPU SKUs. Change the TDP and DEP size accordingly to match the GPU capacity.