# Dynamo Benchmarking Guide This benchmarking framework lets you compare performance across any combination of: - **DynamoGraphDeployments** (automatically deployed from your manifests) - **External HTTP endpoints** (existing services, vLLM, TensorRT-LLM, etc.) You can mix and match these in a single benchmark run using custom labels. Configure your DynamoGraphDeployment manifests for your specific models, hardware, and parallelization needs. ## What This Tool Does The framework is a wrapper around `genai-perf` that: - Deploys user-specified `DynamoGraphDeployments` automatically - Benchmarks any HTTP endpoints (no deployment needed) - Runs concurrency sweeps across configurable load levels - Generates comparison plots with your custom labels - Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.) - Runs locally and connects to your Kubernetes deployments/endpoints **Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`) **Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The actual model loaded is determined by your deployment manifests. Only one model can be benchmarked at a time across all inputs to ensure fair comparison. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model in the manifest(s) and the model deployed at the endpoint(s). ## Prerequisites 1. **Kubernetes cluster with NVIDIA GPUs and Dynamo Cloud platform** - You need a Kubernetes cluster with eligible NVIDIA GPUs and the Dynamo Cloud platform installed. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. 2. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster. 3. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using: ```bash pip install -r deploy/utils/requirements.txt ``` *Note: if you are on Ubuntu 22.04 or lower, you will also need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source).* ## Quick Start Examples The tool can be used to deploy, benchmark and compare Dynamo deployments (DynamoGraphDeployments) on a Kubernetes cluster as well as benchmark and compare servers deployed separately given a URL. In the examples below, Dynamo deployments are specified with a yaml and servers deployed separately by URL. ```bash export NAMESPACE=benchmarking # Compare multiple DynamoGraphDeployments of a single backend ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input agg=components/backends/vllm/deploy/agg.yaml \ --input disagg=components/backends/vllm/deploy/disagg.yaml # Compare different backend types (vLLM vs TensorRT-LLM) ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input vllm-disagg=components/backends/vllm/deploy/disagg.yaml \ --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml # Compare Dynamo deployment vs existing deployment (external endpoint) ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input dynamo=components/backends/vllm/deploy/disagg.yaml \ --input vllm-baseline=http://localhost:8000 # Compare three different configurations ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input dynamo-agg=components/backends/vllm/deploy/agg.yaml \ --input dynamo-disagg=components/backends/vllm/deploy/disagg.yaml \ --input external-vllm=http://localhost:8000 # Benchmark single external endpoint ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input production-api=http://your-api:8000 # Custom model and sequence lengths ./benchmarks/benchmark.sh --namespace $NAMESPACE \ --input my-setup=my-custom-manifest.yaml \ --model "meta-llama/Meta-Llama-3-8B" --isl 512 --osl 256 ``` **Key**: Configure your manifests for your specific models, hardware, and parallelization strategy before benchmarking. ### Important: Image Accessibility Ensure container images in your DynamoGraphDeployment manifests are accessible: - **Public images**: Use [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) public releases - **Custom registries**: Configure proper credentials in your Kubernetes namespace ## Configuration and Usage ### Command Line Options ```bash ./benchmarks/benchmark.sh --namespace NAMESPACE --input