"tests/vscode:/vscode.git/clone" did not exist on "33e5d7e6b6d672eb7ecef038bbffc8b366f31220"
Unverified Commit ff2c2bd8 authored by Sophie du Couédic's avatar Sophie du Couédic Committed by GitHub
Browse files

[Docs]Add documentation for bench serve visualization arguments (#40539)


Signed-off-by: default avatarSophie du Couédic <sop@zurich.ibm.com>
Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
parent cde8d247
This diff is collapsed.
......@@ -108,6 +108,38 @@ P99 ITL (ms): 8.39
==================================================
```
#### Results Visualization
The `--plot-timeline` and `--plot-dataset-stats` can be used to generate respectively the requests completion timeline and dataset prompt and output tokens statistics, which can be useful for debugging purpose or for deeper analysis.
```bash
vllm bench serve \
--backend vllm \
--model meta-llama/Llama-3.1-8B-Instruct \
--endpoint /v1/completions \
--dataset-name sharegpt \
--dataset-path <your data path>/ShareGPT_V3_unfiltered_cleaned_split.json \
--num-prompts 100 \
--plot-timeline \
--timeline-itl-thresholds 2,5 \
--plot-dataset-stats \
--save-result
```
##### Interactive Timeline
The generated timeline is an interactive visualization in the form of an HTML file that can be rendered in most browsers. To customize the ITL color thresholds, one can use `--timeline-itl-thresholds` flag (default: 25ms, 50ms)
Example output:
<iframe src="../../assets/contributing/vllm_bench_serve_timeline.html" width="100%" height="600" frameborder="0"></iframe>
##### Dataset statistics
The generated figure shows the input prompt and output tokens distribution.
Example output: ![Dataset Statistics](../assets/contributing/vllm_bench_serve_dataset_stats.png)
#### Custom Dataset
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
......
......@@ -123,7 +123,8 @@ extend-exclude = ["tests/models/fixtures/*", "tests/prompts/*", "tests/tokenizer
"benchmarks/sonnet.txt", "tests/lora/data/*", "build/*",
"examples/pooling/token_embed/*", "tests/models/language/pooling/*",
"vllm/third_party/*", "vllm/entrypoints/serve/instrumentator/static/*", "tests/entrypoints/openai/speech_to_text/test_transcription_validation.py",
"docs/governance/process.md", "tests/v1/engine/test_fast_incdec_prefix_err.py", ".git/*"]
"docs/governance/process.md", "docs/assets/contributing/vllm_bench_serve_timeline.html",
"tests/v1/engine/test_fast_incdec_prefix_err.py", ".git/*"]
ignore-hidden = false
[tool.typos.default]
......
......@@ -1611,14 +1611,12 @@ def add_cli_args(parser: argparse.ArgumentParser):
)
parser.add_argument(
"--timeline-itl-thresholds",
type=float,
nargs=2,
default=[25.0, 50.0],
metavar=("THRESHOLD1", "THRESHOLD2"),
type=str,
default="25,50",
help="ITL thresholds in milliseconds for timeline plot coloring. "
"Specify two values to categorize inter-token latencies into three groups: "
"below first threshold (green), between thresholds (orange), "
"and above second threshold (red). Default: 25 50 (milliseconds).",
"Specify two comma-separated values to categorize inter-token "
"latencies into three groups: below first threshold (green), "
"between thresholds (orange), and above second threshold (red).",
)
parser.add_argument(
"--plot-dataset-stats",
......@@ -1637,6 +1635,19 @@ async def main_async(args: argparse.Namespace) -> dict[str, Any]:
random.seed(args.seed)
np.random.seed(args.seed)
# Validate timeline ITL thresholds
if args.plot_timeline:
try:
itl_thresholds = [
float(t.strip()) for t in args.timeline_itl_thresholds.split(",")
]
if len(itl_thresholds) != 2:
raise ValueError(
f"Expected 2 ITL threshold values, got {len(itl_thresholds)}"
)
except ValueError as e:
raise ValueError(f"Invalid --timeline-itl-thresholds format: {e}") from e
# Validate ramp-up arguments
if args.ramp_up_strategy is not None:
if args.request_rate != float("inf"):
......@@ -1906,7 +1917,9 @@ async def main_async(args: argparse.Namespace) -> dict[str, Any]:
timeline_path = Path(file_name).with_suffix(".timeline.html")
# Convert thresholds from milliseconds to seconds
itl_thresholds_sec = [t / 1000.0 for t in args.timeline_itl_thresholds]
itl_thresholds_sec = [
float(t) / 1000.0 for t in args.timeline_itl_thresholds.split(",")
]
generate_timeline_plot(
per_request_data, timeline_path, itl_thresholds=itl_thresholds_sec
)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment