# Dynamo model serving recipes | Model family | Backend | Mode | GPU | Deployment | Benchmark | |---------------|---------|---------------------|-------|------------|-----------| | llama-3-70b | vllm | agg | H100, H200 | ✓ | ✓ | | llama-3-70b | vllm | disagg-multi-node | H100, H200 | ✓ | ✓ | | llama-3-70b | vllm | disagg-single-node | H100, H200 | ✓ | ✓ | | DeepSeek-R1 | sglang | disaggregated | H200 | ✓ | 🚧 | | oss-gpt | trtllm | aggregated | GB200 | ✓ | ✓ | ## Prerequisites 1. Create a namespace and populate NAMESPACE environment variable This environment variable is used in later steps to deploy and perf-test the model. ```bash export NAMESPACE=your-namespace kubectl create namespace ${NAMESPACE} ``` 2. **Dynamo Cloud Platform installed** - Follow [Quickstart Guide](../docs/kubernetes/README.md) 3. **Kubernetes cluster with GPU support** 4. **Container registry access** for vLLM runtime images 5. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`) Update the `hf-token-secret.yaml` file with your HuggingFace token. ```bash kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} ``` 6. (Optional) Create a shared model cache pvc to store the model weights. Choose a storage class to create the model cache pvc. You'll need to use this storage class name to update the `storageClass` field in the model-cache/model-cache.yaml file. ```bash kubectl get storageclass ``` ## Running the recipes Run the recipe to deploy a model: ```bash ./run.sh --model --framework ``` Arguments: Deployment type (e.g., agg, disagg-single-node, disagg-multi-node) Required Options: --model Model name (e.g., llama-3-70b) --framework Framework one of VLLM TRTLLM SGLANG (default: VLLM) Optional: --skip-model-cache Skip model downloading (assumes model cache already exists) -h, --help Show this help message Environment Variables: NAMESPACE Kubernetes namespace (default: dynamo) Examples: ./run.sh --model llama-3-70b --framework vllm agg ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg ./run.sh --model llama-3-70b --framework trtllm disagg-single-node Example: ```bash ./run.sh --model llama-3-70b --framework vllm --deployment-type agg ``` ## Dry run mode To dry run the recipe, add the `--dry-run` flag. ```bash ./run.sh --dry-run --model llama-3-70b --framework vllm agg ``` ## (Optional) Running the recipes with model cache You may need to cache the model weights on a PVC to avoid repeated downloads of the model weights. See the [Prerequisites](#prerequisites) section for more details. ```bash ./run.sh --model llama-3-70b --framework vllm --deployment-type agg --skip-model-cache ```