feat: decouple existing benchmarking within docs (#3072)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

feat: decouple existing benchmarking within docs (#3072)
Signed-off-by: Hannah Zhang <hannahz@nvidia.com> Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
26889b09 · hhzhang16 · GitHub · 67ff181d · 26889b09 · 67ff181d
Unverified Commit 26889b09 authored Sep 17, 2025 by hhzhang16 Committed by GitHub Sep 17, 2025
Showing with 144 additions and 511 deletions

benchmarks/README.md benchmarks/README.md +18 -34

benchmarks/benchmark.sh benchmarks/benchmark.sh +0 -370

docs/benchmarks/benchmarking.md docs/benchmarks/benchmarking.md +126 -107

No files found.
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -15,45 +15,29 @@
 # Benchmarks
-This directory contains benchmarking scripts and tools for performance evaluation of Dynamo deployments. The benchmarking framework is a wrapper around genai-perf that makes it easy to benchmark DynamoGraphDeployments and compare them with external endpoints.
+This directory contains benchmarking scripts and tools for performance evaluation of Dynamo deployments. The benchmarking framework is a wrapper around genai-perf that makes it easy to benchmark DynamoGraphDeployments or other deployments with exposed endpoints.
 ## Quick Start
-### Benchmark an Existing Endpoint
+### Benchmark a Dynamo Deployment
-```bash
+First, deploy your DynamoGraphDeployment using the [deployment documentation](../components/backends/), then:
-./benchmark.sh --namespace my-namespace --input my-endpoint=http://your-endpoint:8000
-```
-### Benchmark Dynamo Deployments
 ```bash
-# Benchmark disaggregated vLLM with custom label
+# Port-forward your deployment to http://localhost:8000
-./benchmark.sh --namespace my-namespace --input vllm-disagg=components/backends/vllm/deploy/disagg.yaml
+kubectl port-forward -n <namespace> svc/<frontend-service-name> 8000:8000 &
-# Benchmark TensorRT-LLM disaggregated deployment
-./benchmark.sh --namespace my-namespace --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml
-# Compare multiple Dynamo deployments
+# Run benchmark
-./benchmark.sh --namespace my-namespace \
+python3 -m benchmarks.utils.benchmark --namespace <namespace> \
-  --input agg=components/backends/vllm/deploy/agg.yaml \
+    --input my-benchmark=http://localhost:8000 \
-  --input disagg=components/backends/vllm/deploy/disagg.yaml
+    --model "<your-model>"
-# Compare Dynamo vs external endpoint
+# Generate plots
-./benchmark.sh --namespace my-namespace \
+python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
-  --input dynamo=components/backends/vllm/deploy/disagg.yaml \
-  --input external=http://localhost:8000
 ```
-**Note**:
- The sample manifests may reference private registry images. Update the `image:` fields to use accessible images from [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) or your own registry before running.
- Only DynamoGraphDeployment manifests are supported for automatic deployment. To benchmark non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), deploy them manually using their Kubernetes guides and use the endpoint option.
 ## Features
-The benchmarking framework supports:
+Benchmark any HTTP endpoints! The benchmarking framework supports:
-**Two Benchmarking Modes:**
- **Endpoint Benchmarking**: Test existing HTTP endpoints without deployment overhead
- **Deployment Benchmarking**: Deploy, test, and cleanup DynamoGraphDeployments automatically
 **Flexible Configuration:**
 - User-defined labels for each input using `--input label=value` format
@@ -61,14 +45,14 @@ The benchmarking framework supports:
 - Customizable concurrency levels (configurable via CONCURRENCIES env var), sequence lengths, and models
 - Automated performance plot generation with custom labels
-**Sequential GPU Usage:**
+**Sequential Execution:**
- Models are deployed and benchmarked **sequentially**, not in parallel
+- Benchmarks are run sequentially, not in parallel
- Each deployment gets exclusive access to all available GPUs during its benchmark run
+- To avoid interference, ensure only one deployment is utilizing the target GPUs during a run
- Ensures accurate performance measurements and fair comparison across configurations
+- This helps produce more comparable measurements across configurations
 **Supported Backends:**
- DynamoGraphDeployments
+- DynamoGraphDeployments with port-forwarded endpoints
- External HTTP endpoints (for comparison with non-Dynamo backends)
+- External HTTP endpoints (for comparison with non-Dynamo backends or platforms)
 ## Installation

--- a/benchmarks/benchmark.sh
+++ b/benchmarks/benchmark.sh
-#!/bin/bash
-# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
-# SPDX-License-Identifier: Apache-2.0
-set -euo pipefail
-# Script directory
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-DYNAMO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-# Configuration - all set via command line arguments
-NAMESPACE=""
-MODEL="Qwen/Qwen3-0.6B"
-ISL=2000
-STD=10
-OSL=256
-OUTPUT_DIR="./benchmarks/results"
-# Input configurations stored as associative arrays
-declare -A INPUT_LABELS
-declare -A INPUT_VALUES
-# Flags
-VERBOSE=false
-show_help() {
-    cat << EOF
-Dynamo Benchmark Runner
-This script is a wrapper around genai-perf that benchmarks Dynamo LLM deployments and
-plots the results in an easy-to-use way. It supports comparing multiple DynamoGraphDeployments
-or endpoints with custom labels defined by you.
-The client runs locally and connects to your deployments/endpoints for benchmarking.
-USAGE:
-    $0 --namespace NAMESPACE --input <label>=<manifest_or_endpoint> [--input <label>=<manifest_or_endpoint>]... [OPTIONS]
-REQUIRED:
-    -n, --namespace NAMESPACE           Kubernetes namespace
-    --input <label>=<manifest_path_or_endpoint>  Benchmark input with custom label
-                                          - <label>: becomes the name/label in plots
-                                          - <manifest_path_or_endpoint>: either a DynamoGraphDeployment manifest or HTTP endpoint URL
-                                          Can be specified multiple times for comparisons
-OPTIONS:
-    -h, --help                    Show this help message
-    -m, --model MODEL             Model name for GenAI-Perf configuration and logging (default: Qwen/Qwen3-0.6B)
-                                  NOTE: This must match the model configured in your deployment manifests and the model deployed in any endpoints.
-    -i, --isl LENGTH              Input sequence length (default: $ISL)
-    -s, --std STDDEV              Input sequence standard deviation (default: $STD)
-    -o, --osl LENGTH              Output sequence length (default: $OSL)
-    -d, --output-dir DIR          Output directory (default: $OUTPUT_DIR)
-    --verbose                     Enable verbose output
-EXAMPLES:
-    # Compare Dynamo deployments of a single backend
-    $0 --namespace \$NAMESPACE \\
-       --input agg=components/backends/vllm/deploy/agg.yaml \\
-       --input disagg=components/backends/vllm/deploy/disagg.yaml
-    # Compare different backend types (vLLM vs TensorRT-LLM)
-    $0 --namespace \$NAMESPACE \\
-       --input vllm-agg=components/backends/vllm/deploy/agg.yaml \\
-       --input trtllm-agg=components/backends/trtllm/deploy/agg.yaml
-    # Compare Dynamo deployment vs external endpoint
-    $0 --namespace \$NAMESPACE \\
-       --input dynamo=components/backends/vllm/deploy/disagg.yaml \\
-       --input external=http://localhost:8000
-    # Compare multiple different configurations (vLLM, TensorRT-LLM, SGLang)
-    $0 --namespace \$NAMESPACE \\
-       --input vllm-agg=components/backends/vllm/deploy/agg.yaml \\
-       --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml \\
-       --input existing-sglang=http://localhost:8000
-    # Benchmark a single Dynamo deployment
-    $0 --namespace \$NAMESPACE \\
-       --input my-setup=components/backends/vllm/deploy/disagg.yaml
-    # Benchmark single external endpoint
-    $0 --namespace \$NAMESPACE \\
-       --input production=http://localhost:8000
-DEPLOYMENT TYPES:
-    - DynamoGraphDeployment: Supports various Dynamo deployment configurations including:
-      * Aggregated deployments (prefill and decode together)
-      * Disaggregated deployments (prefill and decode separate)
-      * Router deployments
-      * Planner deployments
-      * And other Dynamo configurations
-    - External Endpoints: For comparing against non-Dynamo backends
-NOTE:
-    - Only DynamoGraphDeployment manifests are supported for automatic deployment.
-    - To benchmark non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), deploy them
-      manually following their Kubernetes deployment guides, expose a port (i.e. via port-forward),
-      and use the endpoint option.
-    - For Dynamo deployment setup, follow the main installation guide at docs/guides/dynamo_deploy/installation_guide.md
-      to install the platform, then use setup_benchmarking_resources.sh for benchmarking resources.
-    - The --model flag configures GenAI-Perf and should match what's configured in your deployment manifests and endpoints.
-    - Only one model can be benchmarked at a time across all inputs.
-EOF
-}
-parse_input() {
-    local input_arg="$1"
-    # Basic format validation: must contain exactly one '=' character
-    if [[ ! "$input_arg" =~ ^[^=]+=[^=]+$ ]]; then
-        echo "ERROR: Invalid input format. Expected: <label>=<manifest_path_or_endpoint>" >&2
-        echo "Got: $input_arg" >&2
-        echo "Format must be: key=value with exactly one '=' character" >&2
-        exit 1
-    fi
-    # Split on the first '=' character
-    local label="${input_arg%%=*}"
-    local value="${input_arg#*=}"
-    # Basic validation - detailed validation will be done in Python
-    if [[ -z "$label" ]]; then
-        echo "ERROR: Label cannot be empty in input: $input_arg" >&2
-        exit 1
-    fi
-    if [[ -z "$value" ]]; then
-        echo "ERROR: Value cannot be empty in input: $input_arg" >&2
-        exit 1
-    fi
-    # Check for duplicate labels
-    if [[ -n "${INPUT_LABELS[$label]:-}" ]]; then
-        echo "ERROR: Duplicate label '$label' found. Each label must be unique." >&2
-        exit 1
-    fi
-    # Store the input
-    INPUT_LABELS["$label"]=1
-    INPUT_VALUES["$label"]="$value"
-    echo "Added input: $label -> $value"
-}
-parse_args() {
-    while [[ $# -gt 0 ]]; do
-        case $1 in
-            -h|--help)
-                show_help
-                exit 0
-                ;;
-            -n|--namespace)
-                NAMESPACE="$2"
-                shift 2
-                ;;
-            -m|--model)
-                MODEL="$2"
-                shift 2
-                ;;
-            -i|--isl)
-                ISL="$2"
-                shift 2
-                ;;
-            -s|--std)
-                STD="$2"
-                shift 2
-                ;;
-            -o|--osl)
-                OSL="$2"
-                shift 2
-                ;;
-            -d|--output-dir)
-                OUTPUT_DIR="$2"
-                shift 2
-                ;;
-            --input)
-                parse_input "$2"
-                shift 2
-                ;;
-            --verbose)
-                VERBOSE=true
-                shift
-                ;;
-            *)
-                echo "Unknown option: $1" >&2
-                echo "Use --help for usage information." >&2
-                exit 1
-                ;;
-        esac
-    done
-}
-validate_config() {
-    local errors=()
-    if [[ -z "$NAMESPACE" ]]; then
-        errors+=("--namespace is required")
-    fi
-    # Check that at least one input is specified
-    if [[ ${#INPUT_LABELS[@]} -eq 0 ]]; then
-        errors+=("At least one --input must be specified")
-    fi
-    if [[ ${#errors[@]} -gt 0 ]]; then
-        echo "ERROR: Missing required arguments:" >&2
-        for error in "${errors[@]}"; do
-            echo "  $error" >&2
-        done
-        echo "Use --help for usage information." >&2
-        exit 1
-    fi
-    # Validate that specified files exist and endpoints are valid URLs
-    for label in "${!INPUT_VALUES[@]}"; do
-        local value="${INPUT_VALUES[$label]}"
-        # Check if it's a URL (starts with http:// or https://)
-        if [[ "$value" =~ ^https?:// ]]; then
-            echo "Input '$label': endpoint $value"
-        else
-            # It should be a file path - validate it exists
-            if [[ ! -f "$value" ]]; then
-                echo "ERROR: Manifest file not found for input '$label': $value" >&2
-                exit 1
-            fi
-            echo "Input '$label': manifest $value"
-        fi
-    done
-    if [[ ! "$ISL" =~ ^[0-9]+$ ]] || [[ "$ISL" -le 0 ]]; then
-        echo "ERROR: ISL must be a positive integer, got: $ISL" >&2
-        exit 1
-    fi
-    if [[ ! "$OSL" =~ ^[0-9]+$ ]] || [[ "$OSL" -le 0 ]]; then
-        echo "ERROR: OSL must be a positive integer, got: $OSL" >&2
-        exit 1
-    fi
-    if [[ ! "$STD" =~ ^[0-9]+$ ]] || [[ "$STD" -lt 0 ]]; then
-        echo "ERROR: STD must be a non-negative integer, got: $STD" >&2
-        exit 1
-    fi
-}
-print_config() {
-    echo "=== Benchmark Configuration ==="
-    echo "Namespace:              $NAMESPACE"
-    echo "Model:                  $MODEL"
-    echo "Input Sequence Length:  $ISL tokens"
-    echo "Output Sequence Length: $OSL tokens"
-    echo "Sequence Std Dev:       $STD tokens"
-    echo "Output Directory:       $OUTPUT_DIR"
-    echo ""
-    echo "Benchmark Inputs:"
-    for label in "${!INPUT_VALUES[@]}"; do
-        local value="${INPUT_VALUES[$label]}"
-        if [[ "$value" =~ ^https?:// ]]; then
-            echo "  $label: endpoint $value"
-        else
-            echo "  $label: manifest $value"
-        fi
-    done
-    echo "==============================="
-    echo
-}
-run_benchmark() {
-    echo "🚀 Starting benchmark workflow..."
-    # Change to dynamo root directory
-    cd "$DYNAMO_ROOT"
-    local cmd=(
-        python3 -u -m benchmarks.utils.benchmark
-        --namespace "$NAMESPACE"
-        --model "$MODEL"
-        --isl "$ISL"
-        --std "$STD"
-        --osl "$OSL"
-        --output-dir "$OUTPUT_DIR"
-    )
-    # Add all input arguments
-    for label in "${!INPUT_VALUES[@]}"; do
-        local value="${INPUT_VALUES[$label]}"
-        cmd+=(--input "$label=$value")
-    done
-    if [[ "$VERBOSE" == "true" ]]; then
-        echo "Executing: ${cmd[*]}"
-    fi
-    if ! "${cmd[@]}"; then
-        echo "❌ Benchmark failed!" >&2
-        exit 1
-    fi
-    echo "✅ Benchmark completed successfully!"
-}
-generate_plots() {
-    echo "📊 Generating performance plots..."
-    cd "$DYNAMO_ROOT"
-    local plot_cmd=(
-        python3 -m benchmarks.utils.plot
-        --data-dir "$OUTPUT_DIR"
-    )
-    if [[ "$VERBOSE" == "true" ]]; then
-        echo "Executing: ${plot_cmd[*]}"
-    fi
-    if ! "${plot_cmd[@]}"; then
-        echo "⚠️  Plot generation failed, but benchmark data is still available" >&2
-        return 1
-    fi
-    echo "✅ Plots generated successfully!"
-    echo "📁 Results available at: $OUTPUT_DIR"
-    echo "📈 Plots available at: $OUTPUT_DIR/plots"
-}
-main() {
-    trap cleanup EXIT
-    parse_args "$@"
-    validate_config
-    print_config
-    if [[ "$VERBOSE" == "true" ]]; then
-        export DYNAMO_VERBOSE=true
-    fi
-    local start_time
-    start_time=$(date +%s)
-    run_benchmark
-    generate_plots
-    local end_time
-    end_time=$(date +%s)
-    local duration
-    duration=$((end_time - start_time))
-    echo
-    echo "🎉 All done!"
-    echo "⏱️  Total time: ${duration}s"
-    echo "📁 Results: $OUTPUT_DIR"
-    echo "📊 Plots: $OUTPUT_DIR/plots"
-}
-cleanup() {
-    if [[ $? -ne 0 ]]; then
-        echo "❌ Script failed. Check logs above for details." >&2
-    fi
-}
-# Only run main if script is executed directly (not sourced)
-if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
-    trap 'cleanup $?' EXIT
-    main "$@"
-fi
--- a/docs/benchmarks/benchmarking.md
+++ b/docs/benchmarks/benchmarking.md
@@ -16,101 +16,138 @@
 # Dynamo Benchmarking Guide
 This benchmarking framework lets you compare performance across any combination of:
- **DynamoGraphDeployments** (automatically deployed from your manifests)
+- **DynamoGraphDeployments**
- **External HTTP endpoints** (existing services, vLLM, TensorRT-LLM, etc.)
+- **External HTTP endpoints** (existing services deployed following standard documentation from vLLM, llm-d, AIBrix, etc.)
-You can mix and match these in a single benchmark run using custom labels. Configure your DynamoGraphDeployment manifests for your specific models, hardware, and parallelization needs.
 ## What This Tool Does
-The framework is a wrapper around `genai-perf` that:
+The framework is a Python-based wrapper around `genai-perf` that:
- Deploys user-specified `DynamoGraphDeployments` automatically
+- Benchmarks any HTTP endpoints
- Benchmarks any HTTP endpoints (no deployment needed)
 - Runs concurrency sweeps across configurable load levels
 - Generates comparison plots with your custom labels
 - Works with any HuggingFace-compatible model on NVIDIA GPUs (H200, H100, A100, etc.)
 - Runs locally and connects to your Kubernetes deployments/endpoints
+- Provides direct Python script execution for maximum flexibility
 **Default sequence lengths**: Input: 2000 tokens, Output: 256 tokens (configurable with `--isl` and `--osl`)
-**Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The actual model loaded is determined by your deployment manifests. Only one model can be benchmarked at a time across all inputs to ensure fair comparison. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model in the manifest(s) and the model deployed at the endpoint(s).
+**Important**: The `--model` parameter configures GenAI-Perf for benchmarking and provides logging context. The default `--model` value in the benchmarking script is `Qwen/Qwen3-0.6B`, but it must match the model deployed at the endpoint(s).
 ## Prerequisites
-1. **Kubernetes cluster with NVIDIA GPUs and Dynamo Cloud platform** - You need a Kubernetes cluster with eligible NVIDIA GPUs and the Dynamo Cloud platform installed. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources.
+1. **Dynamo container environment** - You must be running inside a Dynamo container with the benchmarking tools pre-installed.
+2. **Ubuntu 24.04** - GenAI-Perf requires Ubuntu 24.04 or higher to work properly. If you are on Ubuntu 22.04 or lower, you will need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source).
-2. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster.
+3. **kubectl access** - You need `kubectl` installed and configured to access your Kubernetes cluster.
-3. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using:
+4. **Benchmark dependencies** - Since benchmarks run locally, you need to install the required Python dependencies. Install them using:
   ```bash
   pip install -r deploy/utils/requirements.txt
   ```
-   *Note: if you are on Ubuntu 22.04 or lower, you will also need to build perf_analyzer [from source](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/install.md#build-from-source).*
-## Quick Start Examples
+## User Workflow
+Follow these steps to benchmark Dynamo deployments:
-The tool can be used to deploy, benchmark and compare Dynamo deployments (DynamoGraphDeployments) on a Kubernetes cluster as well as benchmark and compare servers deployed separately given a URL. In the examples below, Dynamo deployments are specified with a yaml and servers deployed separately by URL.
+### Step 1: Establish Kubernetes Cluster and Install Dynamo
+Set up your Kubernetes cluster with NVIDIA GPUs and install the Dynamo Cloud platform. First follow the [installation guide](../../guides/dynamo_deploy/installation_guide.md) to install Dynamo Cloud, then use [deploy/utils/README](../../deploy/utils/README.md) to set up benchmarking resources.
+### Step 2: Deploy DynamoGraphDeployments
+Deploy your DynamoGraphDeployments separately using the [deployment documentation](../../components/backends/). Each deployment should have a frontend service exposed.
+### Step 3: Port-Forward and Benchmark Deployment A
 ```bash
-export NAMESPACE=benchmarking
+# Port-forward the frontend service for deployment A
+kubectl port-forward -n <namespace> svc/<frontend-service-name> 8000:8000 &
-# Compare multiple DynamoGraphDeployments of a single backend
+# Note: remember to stop the port-forward process after benchmarking.
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
-   --input agg=components/backends/vllm/deploy/agg.yaml \
+# Benchmark deployment A using Python scripts
-   --input disagg=components/backends/vllm/deploy/disagg.yaml
+python3 -m benchmarks.utils.benchmark --namespace <namespace> \
+   --input deployment-a=http://localhost:8000 \
-# Compare different backend types (vLLM vs TensorRT-LLM)
+   --model "your-model-name" \
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
+   --output-dir ./benchmarks/results
-   --input vllm-disagg=components/backends/vllm/deploy/disagg.yaml \
-   --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml
-# Compare Dynamo deployment vs existing deployment (external endpoint)
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
-   --input dynamo=components/backends/vllm/deploy/disagg.yaml \
-   --input vllm-baseline=http://localhost:8000
-# Compare three different configurations
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
-   --input dynamo-agg=components/backends/vllm/deploy/agg.yaml \
-   --input dynamo-disagg=components/backends/vllm/deploy/disagg.yaml \
-   --input external-vllm=http://localhost:8000
-# Benchmark single external endpoint
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
-   --input production-api=http://your-api:8000
-# Custom model and sequence lengths
-./benchmarks/benchmark.sh --namespace $NAMESPACE \
-   --input my-setup=my-custom-manifest.yaml \
-   --model "meta-llama/Meta-Llama-3-8B" --isl 512 --osl 256
 ```
-**Key**: Configure your manifests for your specific models, hardware, and parallelization strategy before benchmarking.
+### Step 4: [If Comparative] Teardown Deployment A and Establish Deployment B
+If comparing multiple deployments, teardown deployment A and deploy deployment B with a different configuration.
+### Step 5: [If Comparative] Port-Forward and Benchmark Deployment B
+```bash
+# Port-forward the frontend service for deployment B
+kubectl port-forward -n <namespace> <frontend-service-name> 8001:8000 &
+# Benchmark deployment B using Python scripts
+python3 -m benchmarks.utils.benchmark --namespace <namespace> \
+   --input deployment-b=http://localhost:8001 \
+   --model "your-model-name" \
+   --output-dir ./benchmarks/results
+```
+### Step 6: Generate Summary and Visualization
+```bash
+# Generate plots and summary using Python plotting script
+python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
+```
+## Example Commands
+### Single Deployment Benchmark
+```bash
+# Port-forward and benchmark a single deployment
+kubectl port-forward -n my-namespace svc/my-frontend-service 8000:8000 &
+python3 -m benchmarks.utils.benchmark --namespace my-namespace \
+   --input my-deployment=http://localhost:8000 \
+   --model "meta-llama/Meta-Llama-3-8B"
+```
+### Comparative Benchmark
+```bash
+# Benchmark deployment A
+kubectl port-forward -n my-namespace svc/agg-frontend 8000:8000 &
+python3 -m benchmarks.utils.benchmark --namespace my-namespace \
+   --input aggregated=http://localhost:8000 \
+   --model "meta-llama/Meta-Llama-3-8B"
+# Benchmark deployment B (different port)
+kubectl port-forward -n my-namespace svc/disagg-frontend 8001:8000 &
+python3 -m benchmarks.utils.benchmark --namespace my-namespace \
+   --input disaggregated=http://localhost:8001 \
+   --model "meta-llama/Meta-Llama-3-8B"
+# Generate comparison plots
+python3 -m benchmarks.utils.plot --data-dir ./benchmarks/results
+```
-### Important: Image Accessibility
+## Use Cases
-Ensure container images in your DynamoGraphDeployment manifests are accessible:
+The benchmarking framework supports various comparative analysis scenarios:
- **Public images**: Use [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) public releases
- **Custom registries**: Configure proper credentials in your Kubernetes namespace
+- **Compare multiple DynamoGraphDeployments of a single backend** (e.g., aggregated vs disaggregated configurations)
+- **Compare different backends** (e.g., vLLM vs TensorRT-LLM vs SGLang)
+- **Compare Dynamo vs other platforms** (e.g., Dynamo vs llm-d vs AIBrix)
+- **Compare different models** (e.g., Llama-3-8B vs Llama-3-70B vs Qwen-3-0.6B)
+- **Compare different hardware configurations** (e.g., H100 vs A100 vs H200)
+- **Compare different parallelization strategies** (e.g., different GPU counts or memory configurations)
 ## Configuration and Usage
 ### Command Line Options
 ```bash
-./benchmarks/benchmark.sh --namespace NAMESPACE --input <label>=<manifest_path_or_endpoint> [--input <label>=<manifest_path_or_endpoint>]... [OPTIONS]
+python3 -m benchmarks.utils.benchmark --namespace NAMESPACE --input <label>=<endpoint_url> [--input <label>=<endpoint_url>]... [OPTIONS]
 REQUIRED:
  -n, --namespace NAMESPACE           Kubernetes namespace
-  --input <label>=<manifest_path_or_endpoint>  Benchmark input with custom label
+  --input <label>=<endpoint_url>     Benchmark input with custom label
                                        - <label>: becomes the name/label in plots
-                                        - <manifest_path_or_endpoint>: either a DynamoGraphDeployment manifest or HTTP endpoint URL
+                                        - <endpoint_url>: HTTP endpoint URL (e.g., http://localhost:8000)
                                        Can be specified multiple times for comparisons
 OPTIONS:
  -h, --help                    Show help message and examples
  -m, --model MODEL             Model name for GenAI-Perf configuration and logging (default: Qwen/Qwen3-0.6B)
-                                NOTE: This must match the model configured in your deployment manifests and endpoints
+                                NOTE: This must match the model deployed at the endpoint
  -i, --isl LENGTH              Input sequence length (default: 2000)
  -s, --std STDDEV              Input sequence standard deviation (default: 10)
  -o, --osl LENGTH              Output sequence length (default: 256)
@@ -122,63 +159,34 @@ OPTIONS:
 - **Custom Labels**: Each input must have a unique label that becomes the name in plots and results
 - **Label Restrictions**: Labels can only contain letters, numbers, hyphens, and underscores. The label `plots` is reserved.
- **Input Types**: Supports DynamoGraphDeployment manifests for automatic deployment, or HTTP endpoints for existing services
+- **Port-Forwarding**: You must have an exposed endpoint before benchmarking
- **Model Parameter**: The `--model` parameter configures GenAI-Perf for testing and logging, not deployment (deployment model is determined by the manifest files)
+- **Model Parameter**: The `--model` parameter configures GenAI-Perf for testing and logging, and must match the model deployed at the endpoint
- **Standalone Deployments**: For non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), you must deploy them manually following their respective Kubernetes deployment guides. The benchmarking framework only supports automatic deployment of DynamoGraphDeployments.
+- **Sequential Benchmarking**: For comparative benchmarks, deploy and benchmark each configuration separately
- **Single Model Requirement**: Only one model can be benchmarked at a time across all inputs to ensure fair comparison.
 ### What Happens During Benchmarking
-The script automatically:
+The Python benchmarking module:
-1. **Deploys** each DynamoGraphDeployment configuration to Kubernetes if manifests are passed in
+1. **Connects** to your port-forwarded endpoint
 2. **Benchmarks** using GenAI-Perf at various concurrency levels (default: 1, 2, 5, 10, 50, 100, 250)
 3. **Measures** key metrics: latency, throughput, time-to-first-token
-4. **Generates** comparison plots using your custom labels in `./benchmarks/results/plots/`
+4. **Saves** results to an output directory organized by input labels
-5. **Cleans up** deployments when complete
-### GPU Resource Usage
-**Important**: Models are deployed and benchmarked **sequentially**, not in parallel. This means:
- **One deployment at a time**: Each DynamoGraphDeployment is deployed, benchmarked, and cleaned up before the next one starts
- **Full GPU access**: Each deployment gets exclusive access to all available GPUs during its benchmark run
- **Resource isolation**: No resource conflicts between different deployment configurations
- **Fair comparison**: Each configuration is tested under identical resource conditions
-This sequential approach ensures:
- **Accurate performance measurements** without interference between deployments
- **Consistent resource allocation** for fair comparison across different configurations
- **Simplified resource management** without complex GPU scheduling
- **Reliable cleanup** between benchmark runs
-If you need to benchmark multiple configurations simultaneously, consider using separate Kubernetes namespaces or running benchmarks on different clusters.
-### Results Clearing Behavior
-**Important**: The benchmark script automatically clears the output directory before each run to ensure clean, reproducible results. This means:
- Previous benchmark results in the same output directory will be completely removed
- Each benchmark run starts with a clean slate
- Results from different runs are not mixed or accumulated
-If you want to preserve results from previous runs, use different output directories with the `--output-dir` flag.
+The Python plotting module:
+1. **Generates** comparison plots using your custom labels in `<OUTPUT_DIR>/plots/`
+2. **Creates** summary statistics and visualizations
 ### Using Your Own Models and Configuration
-The benchmarking framework supports any HuggingFace-compatible LLM model. To benchmark your own custom deployment:
+The benchmarking framework supports any HuggingFace-compatible LLM model. Specify your model in the benchmark script's `--model` parameter. It must match the model name of the deployment. You can override the default sequence lengths (2000/256 tokens) with `--isl` and `--osl` flags if needed for your specific workload.
-1. **Edit your deployment YAML files** to specify your model in the `--model` argument of the container command
+### Python Script Usage
-2. **Use the corresponding model name** in the benchmark script's `--model` parameter
-**Note**: You can override the default sequence lengths (2000/256 tokens) with `--isl` and `--osl` flags if needed for your specific workload.
+The benchmarking framework is built around Python modules that provide direct control over the benchmark workflow:
-### Direct Python Execution
-For direct control over the benchmark workflow:
 ```bash
 # Endpoint benchmarking
 python3 -u -m benchmarks.utils.benchmark \
-   --input trtllm=http://your-endpoint:8000 \
+   --input experiment-a=http://your-endpoint:8000 \
   --namespace $NAMESPACE \
   --isl 2000 \
   --std 10 \
@@ -187,18 +195,19 @@ python3 -u -m benchmarks.utils.benchmark \
 # Deployment benchmarking (any combination)
 python3 -u -m benchmarks.utils.benchmark \
-   --input agg=$AGG_CONFIG \
+   --input experiment-a=http://localhost:8000 \
-   --input disagg=$DISAGG_CONFIG \
+   --input experiment-b=http://localhost:8005 \
-   --namespace $NAMESPACE \
+   --namespace my-namespace \
   --isl 2000 \
   --std 10 \
   --osl 256 \
-   --output-dir $OUTPUT_DIR
+   --output-dir ./benchmarks/results
 # Generate plots separately
 python3 -m benchmarks.utils.plot --data-dir $OUTPUT_DIR
 ```
+**Note**: The Python benchmarking module connects to your existing endpoints, runs the benchmarks, and can generate plots. Deployment is user-managed and out of scope for this tool.
 ### Comparison Limitations
 The plotting system supports up to 12 different inputs in a single comparison. If you need to compare more than 12 different deployments/endpoints, consider running separate benchmark sessions or grouping related comparisons together.
@@ -209,17 +218,25 @@ You can customize the concurrency levels using the CONCURRENCIES environment var
 ```bash
 # Custom concurrency levels
-CONCURRENCIES="1,5,20,50" ./benchmarks/benchmark.sh --namespace $NAMESPACE --input my-test=components/backends/vllm/deploy/disagg.yaml
+CONCURRENCIES="1,5,20,50" python3 -m benchmarks.utils.benchmark --namespace $NAMESPACE --input my-test=http://localhost:8000
 # Or set permanently
 export CONCURRENCIES="1,2,5,10,25,50,100"
-./benchmarks/benchmark.sh --namespace $NAMESPACE --input test=disagg.yaml
+python3 -m benchmarks.utils.benchmark --namespace $NAMESPACE --input test=http://localhost:8000
 ```
 ## Understanding Your Results
 After benchmarking completes, check `./benchmarks/results/` (or your custom output directory):
+### Plot Labels and Organization
+The plotting script uses the `--input` labels (the keys before the `=` sign) as the experiment names in all generated plots. For example:
+- `--input aggregated=http://localhost:8000` → plots will show "aggregated" as the label
+- `--input vllm-disagg=http://localhost:8001` → plots will show "vllm-disagg" as the label
+This allows you to easily identify and compare different configurations in the visualization plots.
 ### Summary and Plots
 ```text
@@ -263,9 +280,9 @@ benchmarks/results/
 ```text
 benchmarks/results/
 ├── plots/
-├── dynamo-agg/                  # --input dynamo-agg=agg.yaml
+├── experiment-a/                  # --input experiment-a=http://localhost:8000
-├── dynamo-disagg/               # --input dynamo-disagg=disagg.yaml
+├── experiment-b/                  # --input experiment-b=http://localhost:8001
-└── external-vllm/               # --input external-vllm=http://localhost:8000
+└── experiment-c/                  # --input experiment-c=http://localhost:8002
 ```
 Each concurrency directory contains:
@@ -275,10 +292,12 @@ Each concurrency directory contains:
 ## Customize Benchmarking Behavior
-The built-in workflow handles DynamoGraphDeployment deployment, benchmarking with genai-perf, and plot generation automatically. If you want to modify the behavior:
+The built-in Python workflow connects to endpoints, benchmarks with genai-perf, and generates plots. If you want to modify the behavior:
 1. **Extend the workflow**: Modify `benchmarks/utils/workflow.py` to add custom deployment types or metrics collection
 2. **Generate different plots**: Modify `benchmarks/utils/plot.py` to generate a different set of plots for whatever you wish to visualize.
-The `benchmark.sh` script provides a complete end-to-end benchmarking experience. For more granular control, use the Python modules directly.
+3. **Direct module usage**: Use individual Python modules (`benchmarks.utils.benchmark`, `benchmarks.utils.plot`) for granular control over each step of the benchmarking process.
+The Python benchmarking module provides a complete end-to-end benchmarking experience with full control over the workflow.