# Dynamo Benchmarking Guide This benchmarking framework lets you compare performance across any combination of: - **DynamoGraphDeployments** - **External HTTP endpoints** (existing services deployed following standard documentation from vLLM, llm-d, AIBrix, etc.) ## What This Tool Does The framework is a Python-based wrapper around `genai-perf` that: - Benchmarks any HTTP endpoints - Runs concurrency sweeps across configurable load levels - Generates comparison plots with your custom labels - Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.) - Runs locally and connects to your Kubernetes deployments/endpoints - Provides direct Python script execution for maximum flexibility **Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`) **Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model deployed at the endpoint(s). ## Prerequisites 1. **Dynamo container environment** - You must be running inside a Dynamo container with the benchmarking tools pre-installed. 2. **Ubuntu 24.04** - GenAI-Perf requires Ubuntu 24.04 or higher to work properly. If you are on Ubuntu 22.04 or lower, you will need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source). 3. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster. 4. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using: ```bash pip install -r deploy/utils/requirements.txt ``` ## User Workflow Follow these steps to benchmark Dynamo deployments: ### Step 1: Establish Kubernetes Cluster and Install Dynamo Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. ### Step 2: Deploy DynamoGraphDeployments Deploy your DynamoGraphDeployments separately using the [deployment documentation](../../components/backends/). Each deployment should have a frontend service exposed. ### Step 3: Port-Forward and Benchmark Deployment A ```bash # Port-forward the frontend service for deployment A kubectl port-forward -n svc/ 8000:8000 & # Note: remember to stop the port-forward process after benchmarking. # Benchmark deployment A using Python scripts python3 -m benchmarks.utils.benchmark --namespace \ --input deployment-a=http://localhost:8000 \ --model "your-model-name" \ --output-dir ./benchmarks/results ``` ### Step 4: [If Comparative] Teardown Deployment A and Establish Deployment B If comparing multiple deployments, teardown deployment A and deploy deployment B with a different configuration. ### Step 5: [If Comparative] Port-Forward and Benchmark Deployment B ```bash # Port-forward the frontend service for deployment B kubectl port-forward -n 8001:8000 & # Benchmark deployment B using Python scripts python3 -m benchmarks.utils.benchmark --namespace \ --input deployment-b=http://localhost:8001 \ --model "your-model-name" \ --output-dir ./benchmarks/results ``` ### Step 6: Generate Summary and Visualization ```bash # Generate plots and summary using Python plotting script python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results ``` ## Example Commands ### Single Deployment Benchmark ```bash # Port-forward and benchmark a single deployment kubectl port-forward -n my-namespace svc/my-frontend-service 8000:8000 & python3 -m benchmarks.utils.benchmark --namespace my-namespace \ --input my-deployment=http://localhost:8000 \ --model "meta-llama/Meta-Llama-3-8B" ``` ### Comparative Benchmark ```bash # Benchmark deployment A kubectl port-forward -n my-namespace svc/agg-frontend 8000:8000 & python3 -m benchmarks.utils.benchmark --namespace my-namespace \ --input aggregated=http://localhost:8000 \ --model "meta-llama/Meta-Llama-3-8B" # Benchmark deployment B (different port) kubectl port-forward -n my-namespace svc/disagg-frontend 8001:8000 & python3 -m benchmarks.utils.benchmark --namespace my-namespace \ --input disaggregated=http://localhost:8001 \ --model "meta-llama/Meta-Llama-3-8B" # Generate comparison plots python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results ``` ## Use Cases The benchmarking framework supports various comparative analysis scenarios: - **Compare multiple DynamoGraphDeployments of a single backend** (e.g., aggregated vs disaggregated configurations) - **Compare different backends** (e.g., vLLM vs TensorRT-LLM vs SGLang) - **Compare Dynamo vs other platforms** (e.g., Dynamo vs llm-d vs AIBrix) - **Compare different models** (e.g., Llama-3-8B vs Llama-3-70B vs Qwen-3-0.6B) - **Compare different hardware configurations** (e.g., H100 vs A100 vs H200) - **Compare different parallelization strategies** (e.g., different GPU counts or memory configurations) ## Configuration and Usage ### Command Line Options ```bash python3 -m benchmarks.utils.benchmark --namespace NAMESPACE --input