result-summary.md

---
id: result-summary
---

# Result Summary

## Introduction

This tool is to generate a readable summary report based on the raw benchmark results of single or multiple machines.

## Usage

1. [Install SuperBench](../getting-started/installation.mdx) on the local machine.

2. Prepare the raw data and rule file on the local machine.

3. Generate the result summary automatically using `sb result summary` command. The detailed command can be found from [SuperBench CLI](../cli.md).

  ```bash
  sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir}
  ```

4. Find the output result file named 'results-summary.md' under ${output_dir}.

## Input

The input includes 2 files:

- **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner.

:::tip Tips
Raw data file can be found at ${output-dir}/results-summary.jsonl after each successful run.
:::

- **Rule File**: It uses YAML format and defines how to generate the result summary including how to classify the metrics and what statistical methods (P50, mean, etc.) are applied.

### Rule File

This section describes how to write rules in **rule file**.

The convention is the same as [SuperBench Config File](../superbench-config.mdx), please view it first.

Here is an overview of the rule file structure:

```yaml title="Scheme"
version: string
superbench:
  rules:
    ${rule_name}:
      statistics:
        - ${statistic_name}
      categories: string
      aggregate: (optional)[bool|string]
      metrics:
        - ${benchmark_name}/regex
        - ${benchmark_name}/regex
```

```yaml title="Example"
# SuperBench rules
version: v0.12
superbench:
  rules:
    kernel_launch:
      statistics:
        - mean
        - p90
        - min
        - max
      aggregate: True
      categories: KernelLaunch
      metrics:
        - kernel-launch/e2e_latency_us
        - kernel-launch/host_dispatch_us
        - kernel-launch/launch_throughput_mkps
        - kernel-launch/device_launch_us
    nccl:
      statistics: mean
      categories: NCCL
      metrics:
        - nccl-bw/allreduce_8388608_busbw
    ib-loopback:
      statistics: mean
      categories: RDMA
      metrics:
        - ib-loopback/IB_write_8388608_Avg_\d+
      aggregate: ib-loopback/IB_write_.*_Avg_(\d+)
```

This rule file describes the rules used for the result summary.

They are organized by the rule name and each rule mainly includes several elements:

#### `metrics`

The list of metrics for this rule. Each metric is in the format of ${benchmark_name}/regex, you can use regex after the first '/', but to be noticed, the benchmark name can not be a regex.

#### `categories`

User-defined category name in string belongs to the rule, which is used to classify and organize the metrics.

#### `aggregate`

This item is used to determine whether to aggregate the benchmark results from multiple devices to treat them as one collection.
For example, aggregate the results of kernel-launch overhead from 8 GPU devices into one collection.

The value of this item should be bool or pattern string with regex:

- bool:
  - `False`(default): if no aggregation.
  - `True`: aggregate the results of multiple ranks. In detail, the metric names in `metrics` like 'metric:\\d+' will be aggregated and turned into 'metric' for most microbenchmark metrics.
- pattern string with regex: aggregate the results using the pattern string, which is used to match the metric names in `metrics`. In detail, the part of the metric that matches the contents of () in the pattern string will be turned into *, other parts of the metric remain unchanged.

#### `statistics`

A list of statistical functions is used for this rule to get the results statistics from multiple nodes/ranks.

The following illustrates all statistical functions:
- `count`
- `max`
- `mean`
- `min`
- `p${value}`: ${value} can be 1-99. For example, p50, p90, etc.
- `std`

## Output

We support different output formats for result sumamry including markdown, html, etc.
The output includes the metrics grouped by category and their values obtained by applying statistical methods to all raw results.