"...git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "4e2d95e372ad5fbef7b27c66d527c37477c0c8bb"
Unverified Commit 26889b09 authored by hhzhang16's avatar hhzhang16 Committed by GitHub
Browse files

feat: decouple existing benchmarking within docs (#3072)


Signed-off-by: default avatarHannah Zhang <hannahz@nvidia.com>
Signed-off-by: default avatarhhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
parent 67ff181d
...@@ -15,45 +15,29 @@ ...@@ -15,45 +15,29 @@
# Benchmarks # Benchmarks
This directory contains benchmarking scripts and tools for performance evaluation of Dynamo deployments. The benchmarking framework is a wrapper around genai-perf that makes it easy to benchmark DynamoGraphDeployments and compare them with external endpoints. This directory contains benchmarking scripts and tools for performance evaluation of Dynamo deployments. The benchmarking framework is a wrapper around genai-perf that makes it easy to benchmark DynamoGraphDeployments or other deployments with exposed endpoints.
## Quick Start ## Quick Start
### Benchmark an Existing Endpoint ### Benchmark a Dynamo Deployment
```bash First, deploy your DynamoGraphDeployment using the [deployment documentation](../components/backends/), then:
./benchmark.sh --namespace my-namespace --input my-endpoint=http://your-endpoint:8000
```
### Benchmark Dynamo Deployments
```bash ```bash
# Benchmark disaggregated vLLM with custom label # Port-forward your deployment to http://localhost:8000
./benchmark.sh --namespace my-namespace --input vllm-disagg=components/backends/vllm/deploy/disagg.yaml kubectl port-forward -n <namespace> svc/<frontend-service-name> 8000:8000 &
# Benchmark TensorRT-LLM disaggregated deployment
./benchmark.sh --namespace my-namespace --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml
# Compare multiple Dynamo deployments # Run benchmark
./benchmark.sh --namespace my-namespace \ python3 -m benchmarks.utils.benchmark --namespace <namespace> \
--input agg=components/backends/vllm/deploy/agg.yaml \ --input my-benchmark=http://localhost:8000 \
--input disagg=components/backends/vllm/deploy/disagg.yaml --model "<your-model>"
# Compare Dynamo vs external endpoint # Generate plots
./benchmark.sh --namespace my-namespace \ python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
--input dynamo=components/backends/vllm/deploy/disagg.yaml \
--input external=http://localhost:8000
``` ```
**Note**:
- The sample manifests may reference private registry images. Update the `image:` fields to use accessible images from [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) or your own registry before running.
- Only DynamoGraphDeployment manifests are supported for automatic deployment. To benchmark non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), deploy them manually using their Kubernetes guides and use the endpoint option.
## Features ## Features
The benchmarking framework supports: Benchmark any HTTP endpoints! The benchmarking framework supports:
**Two Benchmarking Modes:**
- **Endpoint Benchmarking**: Test existing HTTP endpoints without deployment overhead
- **Deployment Benchmarking**: Deploy, test, and cleanup DynamoGraphDeployments automatically
**Flexible Configuration:** **Flexible Configuration:**
- User-defined labels for each input using `--input label=value` format - User-defined labels for each input using `--input label=value` format
...@@ -61,14 +45,14 @@ The benchmarking framework supports: ...@@ -61,14 +45,14 @@ The benchmarking framework supports:
- Customizable concurrency levels (configurable via CONCURRENCIES env var), sequence lengths, and models - Customizable concurrency levels (configurable via CONCURRENCIES env var), sequence lengths, and models
- Automated performance plot generation with custom labels - Automated performance plot generation with custom labels
**Sequential GPU Usage:** **Sequential Execution:**
- Models are deployed and benchmarked **sequentially**, not in parallel - Benchmarks are run sequentially, not in parallel
- Each deployment gets exclusive access to all available GPUs during its benchmark run - To avoid interference, ensure only one deployment is utilizing the target GPUs during a run
- Ensures accurate performance measurements and fair comparison across configurations - This helps produce more comparable measurements across configurations
**Supported Backends:** **Supported Backends:**
- DynamoGraphDeployments - DynamoGraphDeployments with port-forwarded endpoints
- External HTTP endpoints (for comparison with non-Dynamo backends) - External HTTP endpoints (for comparison with non-Dynamo backends or platforms)
## Installation ## Installation
......
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
set -euo pipefail
# Script directory
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DYNAMO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
# Configuration - all set via command line arguments
NAMESPACE=""
MODEL="Qwen/Qwen3-0.6B"
ISL=2000
STD=10
OSL=256
OUTPUT_DIR="./benchmarks/results"
# Input configurations stored as associative arrays
declare -A INPUT_LABELS
declare -A INPUT_VALUES
# Flags
VERBOSE=false
show_help() {
cat << EOF
Dynamo Benchmark Runner
This script is a wrapper around genai-perf that benchmarks Dynamo LLM deployments and
plots the results in an easy-to-use way. It supports comparing multiple DynamoGraphDeployments
or endpoints with custom labels defined by you.
The client runs locally and connects to your deployments/endpoints for benchmarking.
USAGE:
$0 --namespace NAMESPACE --input <label>=<manifest_or_endpoint> [--input <label>=<manifest_or_endpoint>]... [OPTIONS]
REQUIRED:
-n, --namespace NAMESPACE Kubernetes namespace
--input <label>=<manifest_path_or_endpoint> Benchmark input with custom label
- <label>: becomes the name/label in plots
- <manifest_path_or_endpoint>: either a DynamoGraphDeployment manifest or HTTP endpoint URL
Can be specified multiple times for comparisons
OPTIONS:
-h, --help Show this help message
-m, --model MODEL Model name for GenAI-Perf configuration and logging (default: Qwen/Qwen3-0.6B)
NOTE: This must match the model configured in your deployment manifests and the model deployed in any endpoints.
-i, --isl LENGTH Input sequence length (default: $ISL)
-s, --std STDDEV Input sequence standard deviation (default: $STD)
-o, --osl LENGTH Output sequence length (default: $OSL)
-d, --output-dir DIR Output directory (default: $OUTPUT_DIR)
--verbose Enable verbose output
EXAMPLES:
# Compare Dynamo deployments of a single backend
$0 --namespace \$NAMESPACE \\
--input agg=components/backends/vllm/deploy/agg.yaml \\
--input disagg=components/backends/vllm/deploy/disagg.yaml
# Compare different backend types (vLLM vs TensorRT-LLM)
$0 --namespace \$NAMESPACE \\
--input vllm-agg=components/backends/vllm/deploy/agg.yaml \\
--input trtllm-agg=components/backends/trtllm/deploy/agg.yaml
# Compare Dynamo deployment vs external endpoint
$0 --namespace \$NAMESPACE \\
--input dynamo=components/backends/vllm/deploy/disagg.yaml \\
--input external=http://localhost:8000
# Compare multiple different configurations (vLLM, TensorRT-LLM, SGLang)
$0 --namespace \$NAMESPACE \\
--input vllm-agg=components/backends/vllm/deploy/agg.yaml \\
--input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml \\
--input existing-sglang=http://localhost:8000
# Benchmark a single Dynamo deployment
$0 --namespace \$NAMESPACE \\
--input my-setup=components/backends/vllm/deploy/disagg.yaml
# Benchmark single external endpoint
$0 --namespace \$NAMESPACE \\
--input production=http://localhost:8000
DEPLOYMENT TYPES:
- DynamoGraphDeployment: Supports various Dynamo deployment configurations including:
* Aggregated deployments (prefill and decode together)
* Disaggregated deployments (prefill and decode separate)
* Router deployments
* Planner deployments
* And other Dynamo configurations
- External Endpoints: For comparing against non-Dynamo backends
NOTE:
- Only DynamoGraphDeployment manifests are supported for automatic deployment.
- To benchmark non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), deploy them
manually following their Kubernetes deployment guides, expose a port (i.e. via port-forward),
and use the endpoint option.
- For Dynamo deployment setup, follow the main installation guide at docs/guides/dynamo_deploy/installation_guide.md
to install the platform, then use setup_benchmarking_resources.sh for benchmarking resources.
- The --model flag configures GenAI-Perf and should match what's configured in your deployment manifests and endpoints.
- Only one model can be benchmarked at a time across all inputs.
EOF
}
parse_input() {
local input_arg="$1"
# Basic format validation: must contain exactly one '=' character
if [[ ! "$input_arg" =~ ^[^=]+=[^=]+$ ]]; then
echo "ERROR: Invalid input format. Expected: <label>=<manifest_path_or_endpoint>" >&2
echo "Got: $input_arg" >&2
echo "Format must be: key=value with exactly one '=' character" >&2
exit 1
fi
# Split on the first '=' character
local label="${input_arg%%=*}"
local value="${input_arg#*=}"
# Basic validation - detailed validation will be done in Python
if [[ -z "$label" ]]; then
echo "ERROR: Label cannot be empty in input: $input_arg" >&2
exit 1
fi
if [[ -z "$value" ]]; then
echo "ERROR: Value cannot be empty in input: $input_arg" >&2
exit 1
fi
# Check for duplicate labels
if [[ -n "${INPUT_LABELS[$label]:-}" ]]; then
echo "ERROR: Duplicate label '$label' found. Each label must be unique." >&2
exit 1
fi
# Store the input
INPUT_LABELS["$label"]=1
INPUT_VALUES["$label"]="$value"
echo "Added input: $label -> $value"
}
parse_args() {
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help
exit 0
;;
-n|--namespace)
NAMESPACE="$2"
shift 2
;;
-m|--model)
MODEL="$2"
shift 2
;;
-i|--isl)
ISL="$2"
shift 2
;;
-s|--std)
STD="$2"
shift 2
;;
-o|--osl)
OSL="$2"
shift 2
;;
-d|--output-dir)
OUTPUT_DIR="$2"
shift 2
;;
--input)
parse_input "$2"
shift 2
;;
--verbose)
VERBOSE=true
shift
;;
*)
echo "Unknown option: $1" >&2
echo "Use --help for usage information." >&2
exit 1
;;
esac
done
}
validate_config() {
local errors=()
if [[ -z "$NAMESPACE" ]]; then
errors+=("--namespace is required")
fi
# Check that at least one input is specified
if [[ ${#INPUT_LABELS[@]} -eq 0 ]]; then
errors+=("At least one --input must be specified")
fi
if [[ ${#errors[@]} -gt 0 ]]; then
echo "ERROR: Missing required arguments:" >&2
for error in "${errors[@]}"; do
echo " $error" >&2
done
echo "Use --help for usage information." >&2
exit 1
fi
# Validate that specified files exist and endpoints are valid URLs
for label in "${!INPUT_VALUES[@]}"; do
local value="${INPUT_VALUES[$label]}"
# Check if it's a URL (starts with http:// or https://)
if [[ "$value" =~ ^https?:// ]]; then
echo "Input '$label': endpoint $value"
else
# It should be a file path - validate it exists
if [[ ! -f "$value" ]]; then
echo "ERROR: Manifest file not found for input '$label': $value" >&2
exit 1
fi
echo "Input '$label': manifest $value"
fi
done
if [[ ! "$ISL" =~ ^[0-9]+$ ]] || [[ "$ISL" -le 0 ]]; then
echo "ERROR: ISL must be a positive integer, got: $ISL" >&2
exit 1
fi
if [[ ! "$OSL" =~ ^[0-9]+$ ]] || [[ "$OSL" -le 0 ]]; then
echo "ERROR: OSL must be a positive integer, got: $OSL" >&2
exit 1
fi
if [[ ! "$STD" =~ ^[0-9]+$ ]] || [[ "$STD" -lt 0 ]]; then
echo "ERROR: STD must be a non-negative integer, got: $STD" >&2
exit 1
fi
}
print_config() {
echo "=== Benchmark Configuration ==="
echo "Namespace: $NAMESPACE"
echo "Model: $MODEL"
echo "Input Sequence Length: $ISL tokens"
echo "Output Sequence Length: $OSL tokens"
echo "Sequence Std Dev: $STD tokens"
echo "Output Directory: $OUTPUT_DIR"
echo ""
echo "Benchmark Inputs:"
for label in "${!INPUT_VALUES[@]}"; do
local value="${INPUT_VALUES[$label]}"
if [[ "$value" =~ ^https?:// ]]; then
echo " $label: endpoint $value"
else
echo " $label: manifest $value"
fi
done
echo "==============================="
echo
}
run_benchmark() {
echo "🚀 Starting benchmark workflow..."
# Change to dynamo root directory
cd "$DYNAMO_ROOT"
local cmd=(
python3 -u -m benchmarks.utils.benchmark
--namespace "$NAMESPACE"
--model "$MODEL"
--isl "$ISL"
--std "$STD"
--osl "$OSL"
--output-dir "$OUTPUT_DIR"
)
# Add all input arguments
for label in "${!INPUT_VALUES[@]}"; do
local value="${INPUT_VALUES[$label]}"
cmd+=(--input "$label=$value")
done
if [[ "$VERBOSE" == "true" ]]; then
echo "Executing: ${cmd[*]}"
fi
if ! "${cmd[@]}"; then
echo "❌ Benchmark failed!" >&2
exit 1
fi
echo "✅ Benchmark completed successfully!"
}
generate_plots() {
echo "📊 Generating performance plots..."
cd "$DYNAMO_ROOT"
local plot_cmd=(
python3 -m benchmarks.utils.plot
--data-dir "$OUTPUT_DIR"
)
if [[ "$VERBOSE" == "true" ]]; then
echo "Executing: ${plot_cmd[*]}"
fi
if ! "${plot_cmd[@]}"; then
echo "⚠️ Plot generation failed, but benchmark data is still available" >&2
return 1
fi
echo "✅ Plots generated successfully!"
echo "📁 Results available at: $OUTPUT_DIR"
echo "📈 Plots available at: $OUTPUT_DIR/plots"
}
main() {
trap cleanup EXIT
parse_args "$@"
validate_config
print_config
if [[ "$VERBOSE" == "true" ]]; then
export DYNAMO_VERBOSE=true
fi
local start_time
start_time=$(date +%s)
run_benchmark
generate_plots
local end_time
end_time=$(date +%s)
local duration
duration=$((end_time - start_time))
echo
echo "🎉 All done!"
echo "⏱️ Total time: ${duration}s"
echo "📁 Results: $OUTPUT_DIR"
echo "📊 Plots: $OUTPUT_DIR/plots"
}
cleanup() {
if [[ $? -ne 0 ]]; then
echo "❌ Script failed. Check logs above for details." >&2
fi
}
# Only run main if script is executed directly (not sourced)
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
trap 'cleanup $?' EXIT
main "$@"
fi
...@@ -16,101 +16,138 @@ ...@@ -16,101 +16,138 @@
# Dynamo Benchmarking Guide # Dynamo Benchmarking Guide
This benchmarking framework lets you compare performance across any combination of: This benchmarking framework lets you compare performance across any combination of:
- **DynamoGraphDeployments** (automatically deployed from your manifests) - **DynamoGraphDeployments**
- **External HTTP endpoints** (existing services, vLLM, TensorRT-LLM, etc.) - **External HTTP endpoints** (existing services deployed following standard documentation from vLLM, llm-d, AIBrix, etc.)
You can mix and match these in a single benchmark run using custom labels. Configure your DynamoGraphDeployment manifests for your specific models, hardware, and parallelization needs.
## What This Tool Does ## What This Tool Does
The framework is a wrapper around `genai-perf` that: The framework is a Python-based wrapper around `genai-perf` that:
- Deploys user-specified `DynamoGraphDeployments` automatically - Benchmarks any HTTP endpoints
- Benchmarks any HTTP endpoints (no deployment needed)
- Runs concurrency sweeps across configurable load levels - Runs concurrency sweeps across configurable load levels
- Generates comparison plots with your custom labels - Generates comparison plots with your custom labels
- Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.) - Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.)
- Runs locally and connects to your Kubernetes deployments/endpoints - Runs locally and connects to your Kubernetes deployments/endpoints
- Provides direct Python script execution for maximum flexibility
**Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`) **Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`)
**Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The actual model loaded is determined by your deployment manifests. Only one model can be benchmarked at a time across all inputs to ensure fair comparison. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model in the manifest(s) and the model deployed at the endpoint(s). **Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model deployed at the endpoint(s).
## Prerequisites ## Prerequisites
1. **Kubernetes cluster with NVIDIA GPUs and Dynamo Cloud platform** - You need a Kubernetes cluster with eligible NVIDIA GPUs and the Dynamo Cloud platform installed. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources. 1. **Dynamo container environment** - You must be running inside a Dynamo container with the benchmarking tools pre-installed.
2. **Ubuntu 24.04** - GenAI-Perf requires Ubuntu 24.04 or higher to work properly. If you are on Ubuntu 22.04 or lower, you will need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source).
2. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster. 3. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster.
3. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using: 4. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using:
```bash ```bash
pip install -r deploy/utils/requirements.txt pip install -r deploy/utils/requirements.txt
``` ```
*Note: if you are on Ubuntu 22.04 or lower, you will also need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source).*
## Quick Start Examples ## User Workflow
Follow these steps to benchmark Dynamo deployments:
The tool can be used to deploy, benchmark and compare Dynamo deployments (DynamoGraphDeployments) on a Kubernetes cluster as well as benchmark and compare servers deployed separately given a URL. In the examples below, Dynamo deployments are specified with a yaml and servers deployed separately by URL. ### Step 1: Establish Kubernetes Cluster and Install Dynamo
Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources.
### Step 2: Deploy DynamoGraphDeployments
Deploy your DynamoGraphDeployments separately using the [deployment documentation](../../components/backends/). Each deployment should have a frontend service exposed.
### Step 3: Port-Forward and Benchmark Deployment A
```bash ```bash
export NAMESPACE=benchmarking # Port-forward the frontend service for deployment A
kubectl port-forward -n <namespace> svc/<frontend-service-name> 8000:8000 &
# Compare multiple DynamoGraphDeployments of a single backend # Note: remember to stop the port-forward process after benchmarking.
./benchmarks/benchmark.sh --namespace $NAMESPACE \
--input agg=components/backends/vllm/deploy/agg.yaml \ # Benchmark deployment A using Python scripts
--input disagg=components/backends/vllm/deploy/disagg.yaml python3 -m benchmarks.utils.benchmark --namespace <namespace> \
--input deployment-a=http://localhost:8000 \
# Compare different backend types (vLLM vs TensorRT-LLM) --model "your-model-name" \
./benchmarks/benchmark.sh --namespace $NAMESPACE \ --output-dir ./benchmarks/results
--input vllm-disagg=components/backends/vllm/deploy/disagg.yaml \
--input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml
# Compare Dynamo deployment vs existing deployment (external endpoint)
./benchmarks/benchmark.sh --namespace $NAMESPACE \
--input dynamo=components/backends/vllm/deploy/disagg.yaml \
--input vllm-baseline=http://localhost:8000
# Compare three different configurations
./benchmarks/benchmark.sh --namespace $NAMESPACE \
--input dynamo-agg=components/backends/vllm/deploy/agg.yaml \
--input dynamo-disagg=components/backends/vllm/deploy/disagg.yaml \
--input external-vllm=http://localhost:8000
# Benchmark single external endpoint
./benchmarks/benchmark.sh --namespace $NAMESPACE \
--input production-api=http://your-api:8000
# Custom model and sequence lengths
./benchmarks/benchmark.sh --namespace $NAMESPACE \
--input my-setup=my-custom-manifest.yaml \
--model "meta-llama/Meta-Llama-3-8B" --isl 512 --osl 256
``` ```
**Key**: Configure your manifests for your specific models, hardware, and parallelization strategy before benchmarking. ### Step 4: [If Comparative] Teardown Deployment A and Establish Deployment B
If comparing multiple deployments, teardown deployment A and deploy deployment B with a different configuration.
### Step 5: [If Comparative] Port-Forward and Benchmark Deployment B
```bash
# Port-forward the frontend service for deployment B
kubectl port-forward -n <namespace> <frontend-service-name> 8001:8000 &
# Benchmark deployment B using Python scripts
python3 -m benchmarks.utils.benchmark --namespace <namespace> \
--input deployment-b=http://localhost:8001 \
--model "your-model-name" \
--output-dir ./benchmarks/results
```
### Step 6: Generate Summary and Visualization
```bash
# Generate plots and summary using Python plotting script
python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
```
## Example Commands
### Single Deployment Benchmark
```bash
# Port-forward and benchmark a single deployment
kubectl port-forward -n my-namespace svc/my-frontend-service 8000:8000 &
python3 -m benchmarks.utils.benchmark --namespace my-namespace \
--input my-deployment=http://localhost:8000 \
--model "meta-llama/Meta-Llama-3-8B"
```
### Comparative Benchmark
```bash
# Benchmark deployment A
kubectl port-forward -n my-namespace svc/agg-frontend 8000:8000 &
python3 -m benchmarks.utils.benchmark --namespace my-namespace \
--input aggregated=http://localhost:8000 \
--model "meta-llama/Meta-Llama-3-8B"
# Benchmark deployment B (different port)
kubectl port-forward -n my-namespace svc/disagg-frontend 8001:8000 &
python3 -m benchmarks.utils.benchmark --namespace my-namespace \
--input disaggregated=http://localhost:8001 \
--model "meta-llama/Meta-Llama-3-8B"
# Generate comparison plots
python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
```
### Important: Image Accessibility ## Use Cases
Ensure container images in your DynamoGraphDeployment manifests are accessible: The benchmarking framework supports various comparative analysis scenarios:
- **Public images**: Use [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) public releases
- **Custom registries**: Configure proper credentials in your Kubernetes namespace - **Compare multiple DynamoGraphDeployments of a single backend** (e.g., aggregated vs disaggregated configurations)
- **Compare different backends** (e.g., vLLM vs TensorRT-LLM vs SGLang)
- **Compare Dynamo vs other platforms** (e.g., Dynamo vs llm-d vs AIBrix)
- **Compare different models** (e.g., Llama-3-8B vs Llama-3-70B vs Qwen-3-0.6B)
- **Compare different hardware configurations** (e.g., H100 vs A100 vs H200)
- **Compare different parallelization strategies** (e.g., different GPU counts or memory configurations)
## Configuration and Usage ## Configuration and Usage
### Command Line Options ### Command Line Options
```bash ```bash
./benchmarks/benchmark.sh --namespace NAMESPACE --input <label>=<manifest_path_or_endpoint> [--input <label>=<manifest_path_or_endpoint>]... [OPTIONS] python3 -m benchmarks.utils.benchmark --namespace NAMESPACE --input <label>=<endpoint_url> [--input <label>=<endpoint_url>]... [OPTIONS]
REQUIRED: REQUIRED:
-n, --namespace NAMESPACE Kubernetes namespace -n, --namespace NAMESPACE Kubernetes namespace
--input <label>=<manifest_path_or_endpoint> Benchmark input with custom label --input <label>=<endpoint_url> Benchmark input with custom label
- <label>: becomes the name/label in plots - <label>: becomes the name/label in plots
- <manifest_path_or_endpoint>: either a DynamoGraphDeployment manifest or HTTP endpoint URL - <endpoint_url>: HTTP endpoint URL (e.g., http://localhost:8000)
Can be specified multiple times for comparisons Can be specified multiple times for comparisons
OPTIONS: OPTIONS:
-h, --help Show help message and examples -h, --help Show help message and examples
-m, --model MODEL Model name for GenAI-Perf configuration and logging (default: Qwen/Qwen3-0.6B) -m, --model MODEL Model name for GenAI-Perf configuration and logging (default: Qwen/Qwen3-0.6B)
NOTE: This must match the model configured in your deployment manifests and endpoints NOTE: This must match the model deployed at the endpoint
-i, --isl LENGTH Input sequence length (default: 2000) -i, --isl LENGTH Input sequence length (default: 2000)
-s, --std STDDEV Input sequence standard deviation (default: 10) -s, --std STDDEV Input sequence standard deviation (default: 10)
-o, --osl LENGTH Output sequence length (default: 256) -o, --osl LENGTH Output sequence length (default: 256)
...@@ -122,63 +159,34 @@ OPTIONS: ...@@ -122,63 +159,34 @@ OPTIONS:
- **Custom Labels**: Each input must have a unique label that becomes the name in plots and results - **Custom Labels**: Each input must have a unique label that becomes the name in plots and results
- **Label Restrictions**: Labels can only contain letters, numbers, hyphens, and underscores. The label `plots` is reserved. - **Label Restrictions**: Labels can only contain letters, numbers, hyphens, and underscores. The label `plots` is reserved.
- **Input Types**: Supports DynamoGraphDeployment manifests for automatic deployment, or HTTP endpoints for existing services - **Port-Forwarding**: You must have an exposed endpoint before benchmarking
- **Model Parameter**: The `--model` parameter configures GenAI-Perf for testing and logging, not deployment (deployment model is determined by the manifest files) - **Model Parameter**: The `--model` parameter configures GenAI-Perf for testing and logging, and must match the model deployed at the endpoint
- **Standalone Deployments**: For non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), you must deploy them manually following their respective Kubernetes deployment guides. The benchmarking framework only supports automatic deployment of DynamoGraphDeployments. - **Sequential Benchmarking**: For comparative benchmarks, deploy and benchmark each configuration separately
- **Single Model Requirement**: Only one model can be benchmarked at a time across all inputs to ensure fair comparison.
### What Happens During Benchmarking ### What Happens During Benchmarking
The script automatically: The Python benchmarking module:
1. **Deploys** each DynamoGraphDeployment configuration to Kubernetes if manifests are passed in 1. **Connects** to your port-forwarded endpoint
2. **Benchmarks** using GenAI-Perf at various concurrency levels (default: 1, 2, 5, 10, 50, 100, 250) 2. **Benchmarks** using GenAI-Perf at various concurrency levels (default: 1, 2, 5, 10, 50, 100, 250)
3. **Measures** key metrics: latency, throughput, time-to-first-token 3. **Measures** key metrics: latency, throughput, time-to-first-token
4. **Generates** comparison plots using your custom labels in `./benchmarks/results/plots/` 4. **Saves** results to an output directory organized by input labels
5. **Cleans up** deployments when complete
### GPU Resource Usage
**Important**: Models are deployed and benchmarked **sequentially**, not in parallel. This means:
- **One deployment at a time**: Each DynamoGraphDeployment is deployed, benchmarked, and cleaned up before the next one starts
- **Full GPU access**: Each deployment gets exclusive access to all available GPUs during its benchmark run
- **Resource isolation**: No resource conflicts between different deployment configurations
- **Fair comparison**: Each configuration is tested under identical resource conditions
This sequential approach ensures:
- **Accurate performance measurements** without interference between deployments
- **Consistent resource allocation** for fair comparison across different configurations
- **Simplified resource management** without complex GPU scheduling
- **Reliable cleanup** between benchmark runs
If you need to benchmark multiple configurations simultaneously, consider using separate Kubernetes namespaces or running benchmarks on different clusters.
### Results Clearing Behavior
**Important**: The benchmark script automatically clears the output directory before each run to ensure clean, reproducible results. This means:
- Previous benchmark results in the same output directory will be completely removed
- Each benchmark run starts with a clean slate
- Results from different runs are not mixed or accumulated
If you want to preserve results from previous runs, use different output directories with the `--output-dir` flag. The Python plotting module:
1. **Generates** comparison plots using your custom labels in `<OUTPUT_DIR>/plots/`
2. **Creates** summary statistics and visualizations
### Using Your Own Models and Configuration ### Using Your Own Models and Configuration
The benchmarking framework supports any HuggingFace-compatible LLM model. To benchmark your own custom deployment: The benchmarking framework supports any HuggingFace-compatible LLM model. Specify your model in the benchmark script's `--model` parameter. It must match the model name of the deployment. You can override the default sequence lengths (2000/256 tokens) with `--isl` and `--osl` flags if needed for your specific workload.
1. **Edit your deployment YAML files** to specify your model in the `--model` argument of the container command ### Python Script Usage
2. **Use the corresponding model name** in the benchmark script's `--model` parameter
**Note**: You can override the default sequence lengths (2000/256 tokens) with `--isl` and `--osl` flags if needed for your specific workload. The benchmarking framework is built around Python modules that provide direct control over the benchmark workflow:
### Direct Python Execution
For direct control over the benchmark workflow:
```bash ```bash
# Endpoint benchmarking # Endpoint benchmarking
python3 -u -m benchmarks.utils.benchmark \ python3 -u -m benchmarks.utils.benchmark \
--input trtllm=http://your-endpoint:8000 \ --input experiment-a=http://your-endpoint:8000 \
--namespace $NAMESPACE \ --namespace $NAMESPACE \
--isl 2000 \ --isl 2000 \
--std 10 \ --std 10 \
...@@ -187,18 +195,19 @@ python3 -u -m benchmarks.utils.benchmark \ ...@@ -187,18 +195,19 @@ python3 -u -m benchmarks.utils.benchmark \
# Deployment benchmarking (any combination) # Deployment benchmarking (any combination)
python3 -u -m benchmarks.utils.benchmark \ python3 -u -m benchmarks.utils.benchmark \
--input agg=$AGG_CONFIG \ --input experiment-a=http://localhost:8000 \
--input disagg=$DISAGG_CONFIG \ --input experiment-b=http://localhost:8005 \
--namespace $NAMESPACE \ --namespace my-namespace \
--isl 2000 \ --isl 2000 \
--std 10 \ --std 10 \
--osl 256 \ --osl 256 \
--output-dir $OUTPUT_DIR --output-dir ./benchmarks/results
# Generate plots separately # Generate plots separately
python3 -m benchmarks.utils.plot --data-dir $OUTPUT_DIR python3 -m benchmarks.utils.plot --data-dir $OUTPUT_DIR
``` ```
**Note**: The Python benchmarking module connects to your existing endpoints, runs the benchmarks, and can generate plots. Deployment is user-managed and out of scope for this tool.
### Comparison Limitations ### Comparison Limitations
The plotting system supports up to 12 different inputs in a single comparison. If you need to compare more than 12 different deployments/endpoints, consider running separate benchmark sessions or grouping related comparisons together. The plotting system supports up to 12 different inputs in a single comparison. If you need to compare more than 12 different deployments/endpoints, consider running separate benchmark sessions or grouping related comparisons together.
...@@ -209,17 +218,25 @@ You can customize the concurrency levels using the CONCURRENCIES environment var ...@@ -209,17 +218,25 @@ You can customize the concurrency levels using the CONCURRENCIES environment var
```bash ```bash
# Custom concurrency levels # Custom concurrency levels
CONCURRENCIES="1,5,20,50" ./benchmarks/benchmark.sh --namespace $NAMESPACE --input my-test=components/backends/vllm/deploy/disagg.yaml CONCURRENCIES="1,5,20,50" python3 -m benchmarks.utils.benchmark --namespace $NAMESPACE --input my-test=http://localhost:8000
# Or set permanently # Or set permanently
export CONCURRENCIES="1,2,5,10,25,50,100" export CONCURRENCIES="1,2,5,10,25,50,100"
./benchmarks/benchmark.sh --namespace $NAMESPACE --input test=disagg.yaml python3 -m benchmarks.utils.benchmark --namespace $NAMESPACE --input test=http://localhost:8000
``` ```
## Understanding Your Results ## Understanding Your Results
After benchmarking completes, check `./benchmarks/results/` (or your custom output directory): After benchmarking completes, check `./benchmarks/results/` (or your custom output directory):
### Plot Labels and Organization
The plotting script uses the `--input` labels (the keys before the `=` sign) as the experiment names in all generated plots. For example:
- `--input aggregated=http://localhost:8000` → plots will show "aggregated" as the label
- `--input vllm-disagg=http://localhost:8001` → plots will show "vllm-disagg" as the label
This allows you to easily identify and compare different configurations in the visualization plots.
### Summary and Plots ### Summary and Plots
```text ```text
...@@ -263,9 +280,9 @@ benchmarks/results/ ...@@ -263,9 +280,9 @@ benchmarks/results/
```text ```text
benchmarks/results/ benchmarks/results/
├── plots/ ├── plots/
├── dynamo-agg/ # --input dynamo-agg=agg.yaml ├── experiment-a/ # --input experiment-a=http://localhost:8000
├── dynamo-disagg/ # --input dynamo-disagg=disagg.yaml ├── experiment-b/ # --input experiment-b=http://localhost:8001
└── external-vllm/ # --input external-vllm=http://localhost:8000 └── experiment-c/ # --input experiment-c=http://localhost:8002
``` ```
Each concurrency directory contains: Each concurrency directory contains:
...@@ -275,10 +292,12 @@ Each concurrency directory contains: ...@@ -275,10 +292,12 @@ Each concurrency directory contains:
## Customize Benchmarking Behavior ## Customize Benchmarking Behavior
The built-in workflow handles DynamoGraphDeployment deployment, benchmarking with genai-perf, and plot generation automatically. If you want to modify the behavior: The built-in Python workflow connects to endpoints, benchmarks with genai-perf, and generates plots. If you want to modify the behavior:
1. **Extend the workflow**: Modify `benchmarks/utils/workflow.py` to add custom deployment types or metrics collection 1. **Extend the workflow**: Modify `benchmarks/utils/workflow.py` to add custom deployment types or metrics collection
2. **Generate different plots**: Modify `benchmarks/utils/plot.py` to generate a different set of plots for whatever you wish to visualize. 2. **Generate different plots**: Modify `benchmarks/utils/plot.py` to generate a different set of plots for whatever you wish to visualize.
The `benchmark.sh` script provides a complete end-to-end benchmarking experience. For more granular control, use the Python modules directly. 3. **Direct module usage**: Use individual Python modules (`benchmarks.utils.benchmark`, `benchmarks.utils.plot`) for granular control over each step of the benchmarking process.
The Python benchmarking module provides a complete end-to-end benchmarking experience with full control over the workflow.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment