README.md 3.85 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<!-- # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

# Benchmarks

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
This directory contains benchmarking scripts and tools for performance evaluation of Dynamo deployments. The benchmarking framework is a wrapper around genai-perf that makes it easy to benchmark DynamoGraphDeployments and compare them with external endpoints.

## Quick Start

### Benchmark an Existing Endpoint
```bash
./benchmark.sh --namespace my-namespace --input my-endpoint=http://your-endpoint:8000
```

### Benchmark Dynamo Deployments
```bash
# Benchmark disaggregated vLLM with custom label
./benchmark.sh --namespace my-namespace --input vllm-disagg=components/backends/vllm/deploy/disagg.yaml

# Benchmark TensorRT-LLM disaggregated deployment
./benchmark.sh --namespace my-namespace --input trtllm-disagg=components/backends/trtllm/deploy/disagg.yaml

# Compare multiple Dynamo deployments
./benchmark.sh --namespace my-namespace \
  --input agg=components/backends/vllm/deploy/agg.yaml \
  --input disagg=components/backends/vllm/deploy/disagg.yaml

# Compare Dynamo vs external endpoint
./benchmark.sh --namespace my-namespace \
  --input dynamo=components/backends/vllm/deploy/disagg.yaml \
  --input external=http://localhost:8000
```

**Note**:
- The sample manifests may reference private registry images. Update the `image:` fields to use accessible images from [Dynamo NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts) or your own registry before running.
- Only DynamoGraphDeployment manifests are supported for automatic deployment. To benchmark non-Dynamo backends (vLLM, TensorRT-LLM, SGLang, etc.), deploy them manually using their Kubernetes guides and use the endpoint option.

## Features

The benchmarking framework supports:

**Two Benchmarking Modes:**
- **Endpoint Benchmarking**: Test existing HTTP endpoints without deployment overhead
- **Deployment Benchmarking**: Deploy, test, and cleanup DynamoGraphDeployments automatically

**Flexible Configuration:**
- User-defined labels for each input using `--input label=value` format
- Support for multiple inputs to enable comparisons
- Customizable concurrency levels (configurable via CONCURRENCIES env var), sequence lengths, and models
- Automated performance plot generation with custom labels

64
65
66
67
68
**Sequential GPU Usage:**
- Models are deployed and benchmarked **sequentially**, not in parallel
- Each deployment gets exclusive access to all available GPUs during its benchmark run
- Ensures accurate performance measurements and fair comparison across configurations

69
70
71
**Supported Backends:**
- DynamoGraphDeployments
- External HTTP endpoints (for comparison with non-Dynamo backends)
72
73
74

## Installation

75
This is already included as part of the Dynamo container images. To install locally or standalone:
76
77
78
79
80

```bash
pip install -e .
```

81
82
83
## Data Generation Tools

This directory also includes lightweight tools for:
84
- Analyzing prefix-structured data (`datagen analyze`)
85
86
- Synthesizing structured data customizable for testing purposes (`datagen synthesize`)

87
88
89
90
91
Detailed information is provided in the `prefix_data_generator` directory.

## Comprehensive Guide

For detailed documentation, configuration options, and advanced usage, see the [complete benchmarking guide](../docs/benchmarks/benchmarking.md).