You can use `--dry-run` to preview the figures to be plotted.
### Pareto visualization (tokens/s/user vs tokens/s/GPU)
`vllm bench sweep plot_pareto` helps pick configurations that balance per-user and per-GPU throughput.
Higher concurrency or batch size can raise GPU efficiency (per-GPU), but can add per user latency; lower concurrency improves per-user rate but underutilizes GPUs; The Pareto frontier shows the best achievable pairs across your runs.
The performance benchmarks are used for development to confirm whether new changes improve performance under various workloads. They are triggered on every commit with both the `perf-benchmarks` and `ready` labels, and when a PR is merged into vLLM.