# Dynamo Benchmarking Guide This benchmarking framework lets you compare performance across any combination of: - **DynamoGraphDeployments** - **External HTTP endpoints** (existing services deployed following standard documentation from vLLM, llm-d, AIBrix, etc.) ## Choosing Your Benchmarking Approach Dynamo provides two benchmarking approaches to suit different use cases: **client-side** and **server-side**. Client-side refers to running benchmarks on your local machine and connecting to Kubernetes deployments via port-forwarding, while server-side refers to running benchmarks directly within the Kubernetes cluster using internal service URLs. Which method to use depends on your use case. **TLDR:** Need high performance/load testing? Server-side. Just quick testing/comparison? Client-side. ### Use Client-Side Benchmarking When: - You want to quickly test deployments - You want immediate access to results on your local machine - You're comparing external services or deployments (not necessarily just Dynamo deployments) - You need to run benchmarks from your laptop/workstation → **[Go to Client-Side Benchmarking (Local)](#client-side-benchmarking-local)** ### Use Server-Side Benchmarking When: - You have a development environment with kubectl access - You're doing performance validation with high load/speed requirements - You're experiencing timeouts or performance issues with client-side benchmarking - You want optimal network performance (no port-forwarding overhead) - You're running automated CI/CD pipelines - You need isolated execution environments - You're doing resource-intensive benchmarking - You want persistent result storage in the cluster → **[Go to Server-Side Benchmarking (In-Cluster)](#server-side-benchmarking-in-cluster)** ### Quick Comparison | Feature | Client-Side | Server-Side | |---------|-------------|-------------| | **Location** | Your local machine | Kubernetes cluster | | **Network** | Port-forwarding required | Direct service DNS | | **Setup** | Quick and simple | Requires cluster resources | | **Performance** | Limited by local resources, may timeout under high load | Optimal cluster performance, handles high load | | **Isolation** | Shared environment | Isolated job execution | | **Results** | Local filesystem | Persistent volumes | | **Best for** | Light load | High load | ## What This Tool Does The framework is a Python-based wrapper around `genai-perf` that: - Benchmarks any HTTP endpoints - Runs concurrency sweeps across configurable load levels - Generates comparison plots with your custom labels - Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.) - Provides direct Python script execution for maximum flexibility **Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`) **Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model deployed at the endpoint(s). --- # Client-Side Benchmarking (Local) Client-side benchmarking runs on your local machine and connects to Kubernetes deployments via port-forwarding. ## Prerequisites 1. **Dynamo container environment** - You must be running inside a Dynamo container with the benchmarking tools pre-installed. 2. **HTTP endpoints** - Ensure you have HTTP endpoints available for benchmarking. These can be: - DynamoGraphDeployments exposed via HTTP endpoints - External services (vLLM, llm-d, AIBrix, etc.) - Any HTTP endpoint serving HuggingFace-compatible models 3. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using: ```bash pip install -r deploy/utils/requirements.txt ``` ## User Workflow Follow these steps to benchmark Dynamo deployments using client-side benchmarking: ### Step 1: Establish Kubernetes Cluster and Install Dynamo Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](/docs/kubernetes/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. ### Step 2: Deploy DynamoGraphDeployments Deploy your DynamoGraphDeployments separately using the [deployment documentation](../../components/backends/). Each deployment should have a frontend service exposed. ### Step 3: Port-Forward and Benchmark Deployment A ```bash # Port-forward the frontend service for deployment A kubectl port-forward -n svc/ 8000:8000 > /dev/null 2>&1 & # Note: remember to stop the port-forward process after benchmarking. # Benchmark deployment A using Python scripts python3 -m benchmarks.utils.benchmark \ --input deployment-a=http://localhost:8000 \ --model "your-model-name" \ --output-dir ./benchmarks/results ``` ### Step 4: [If Comparative] Teardown Deployment A and Establish Deployment B If comparing multiple deployments, teardown deployment A and deploy deployment B with a different configuration. ### Step 5: [If Comparative] Port-Forward and Benchmark Deployment B ```bash # Port-forward the frontend service for deployment B kubectl port-forward -n svc/ 8001:8000 > /dev/null 2>&1 & # Benchmark deployment B using Python scripts python3 -m benchmarks.utils.benchmark \ --input deployment-b=http://localhost:8001 \ --model "your-model-name" \ --output-dir ./benchmarks/results ``` ### Step 6: Generate Summary and Visualization ```bash # Generate plots and summary using Python plotting script python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results ``` ## Use Cases The benchmarking framework supports various comparative analysis scenarios: - **Compare multiple DynamoGraphDeployments of a single backend** (e.g., aggregated vs disaggregated configurations) - **Compare different backends** (e.g., vLLM vs TensorRT-LLM vs SGLang) - **Compare Dynamo vs other platforms** (e.g., Dynamo vs llm-d vs AIBrix) - **Compare different models** (e.g., Llama-3-8B vs Llama-3-70B vs Qwen-3-0.6B) - **Compare different hardware configurations** (e.g., H100 vs A100 vs H200) - **Compare different parallelization strategies** (e.g., different GPU counts or memory configurations) ## Configuration and Usage ### Command Line Options ```bash python3 -m benchmarks.utils.benchmark --input