[Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819)

Signed-off-by: Naman Lalit <nl2688@nyu.edu>

[Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819)
Signed-off-by: Naman Lalit <nl2688@nyu.edu>
9bedac96 · Naman Lalit · GitHub · c42ff4f4 · 9bedac96 · 9bedac96
Unverified Commit 9bedac96 authored Sep 29, 2025 by Naman Lalit Committed by GitHub Sep 29, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 40 additions and 0 deletions

docs/contributing/benchmarks.md docs/contributing/benchmarks.md +24 -0

docs/contributing/profiling.md docs/contributing/profiling.md +16 -0

No files found.
--- a/docs/contributing/benchmarks.md
+++ b/docs/contributing/benchmarks.md
@@ -823,6 +823,30 @@ The latest performance results are hosted on the public [vLLM Performance Dashbo
 More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
+### Continuous Benchmarking
+The continuous benchmarking provides automated performance monitoring for vLLM across different models and GPU devices. This helps track vLLM's performance characteristics over time and identify any performance regressions or improvements.
+#### How It Works
+The continuous benchmarking is triggered via a [GitHub workflow CI](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) in the PyTorch infrastructure repository, which runs automatically every 4 hours. The workflow executes three types of performance tests:
+- **Serving tests**: Measure request handling and API performance
+- **Throughput tests**: Evaluate token generation rates
+- **Latency tests**: Assess response time characteristics
+#### Benchmark Configuration
+The benchmarking currently runs on a predefined set of models configured in the [vllm-benchmarks directory](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks). To add new models for benchmarking:
+1. Navigate to the appropriate GPU directory in the benchmarks configuration
+2. Add your model specifications to the corresponding configuration files
+3. The new models will be included in the next scheduled benchmark run
+#### Viewing Results
+All continuous benchmarking results are automatically published to the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm).
 [](){ #nightly-benchmarks }
 ## Nightly Benchmarks

--- a/docs/contributing/profiling.md
+++ b/docs/contributing/profiling.md
@@ -160,6 +160,22 @@ GUI example:
 <img width="1799" alt="Screenshot 2025-03-05 at 11 48 42 AM" src="https://github.com/user-attachments/assets/c7cff1ae-6d6f-477d-a342-bd13c4fc424c" />
+## Continuous Profiling
+There is a [GitHub CI workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-profiling.yml) in the PyTorch infrastructure repository that provides continuous profiling for different models on vLLM. This automated profiling helps track performance characteristics over time and across different model configurations.
+### How It Works
+The workflow currently runs weekly profiling sessions for selected models, generating detailed performance traces that can be analyzed using different tools to identify performance regressions or optimization opportunities. But, it can be triggered manually as well, using the Github Action tool.
+### Adding New Models
+To extend the continuous profiling to additional models, you can modify the [profiling-tests.json](https://github.com/pytorch/pytorch-integration-testing/blob/main/vllm-profiling/cuda/profiling-tests.json) configuration file in the PyTorch integration testing repository. Simply add your model specifications to this file to include them in the automated profiling runs.
+### Viewing Profiling Results
+The profiling traces generated by the continuous profiling workflow are publicly available on the [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). Look for the **Profiling traces** table to access and download the traces for different models and runs.
 ## Profiling vLLM Python Code
 The Python standard library includes