Unverified Commit fd2bc9e0 authored by guoshzhao's avatar guoshzhao Committed by GitHub
Browse files

Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283)

**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
parent f7ffc545
......@@ -133,11 +133,15 @@ Inference performance of the torchvision models using ONNXRuntime. Currently the
> resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, shufflenet_v2_x0_5, shufflenet_v2_x1_0,
> squeezenet1_0, squeezenet1_1, vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|-----------------------------------------------|-----------|-----------------------------------------------------------|
| ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. |
| Name | Unit | Description |
|---------------------------------------------------------|-----------|-----------------------------------------------------------------------------|
| ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. |
| ort-inference/{precision}_{model}_time_{percentile} | time (ms) | The {percentile}th percentile latency to execute one batch of inference. |
## Communication Benchmarks
......
......@@ -12,57 +12,60 @@ id: model-benchmarks
Run training or inference tasks with single or half precision for GPT models,
including gpt2-small, gpt2-medium, gpt2-large and gpt2-xl.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------|
| gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `bert_models`
#### Introduction
Run training or inference tasks with single or half precision for BERT models, including bert-base and bert-large.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|-------------------------------------------------------------|------------------------|---------------------------------------------|
| bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|--------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `lstm_models`
#### Introduction
Run training or inference tasks with single or half precision for one bidirectional LSTM model.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|----------------------------------------------------|------------------------|---------------------------------------------|
| lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | Train step time with single precision. |
| lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | Train step time with half precision. |
| lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-----------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `cnn_models`
......@@ -80,16 +83,17 @@ Run training or inference tasks with single or half precision for CNN models lis
* shufflenet: shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
* squeezenet: squeezenet1_0, squeezenet1_1
* others: alexnet, googlenet, inception_v3
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------|
| cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train average step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train average throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train average step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train average throughput with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
......@@ -142,6 +142,7 @@ setup(
'knack>=0.7.2',
'matplotlib>=3.0.0',
'natsort>=7.1.1',
'numpy>=1.19.2',
'openpyxl>=3.0.7',
'omegaconf==2.0.6',
'pandas>=1.1.5',
......
......@@ -9,6 +9,8 @@ from datetime import datetime
from operator import attrgetter
from abc import ABC, abstractmethod
import numpy as np
from superbench.common.utils import logger
from superbench.benchmarks import BenchmarkType, ReturnCode
from superbench.benchmarks.result import BenchmarkResult
......@@ -246,6 +248,22 @@ class Benchmark(ABC):
return True
def _process_percentile_result(self, metric, result, reduce_type=None):
"""Function to process the percentile results.
Args:
metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function.
"""
if len(result) > 0:
percentile_list = ['50', '90', '95', '99', '99.9']
for percentile in percentile_list:
self._result.add_result(
'{}_{}'.format(metric, percentile),
np.percentile(result, float(percentile), interpolation='nearest'), reduce_type
)
def print_env_info(self):
"""Print environments or dependencies information."""
# TODO: will implement it when add real benchmarks in the future.
......
......@@ -49,13 +49,14 @@ class MicroBenchmark(Benchmark):
"""
pass
def _process_numeric_result(self, metric, result, reduce_type=None):
def _process_numeric_result(self, metric, result, reduce_type=None, cal_percentile=False):
"""Function to save the numerical results.
Args:
metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function.
cal_percentile (bool): Whether to calculate the percentile results.
Return:
True if result list is not empty.
......@@ -70,6 +71,8 @@ class MicroBenchmark(Benchmark):
self._result.add_raw_data(metric, result)
self._result.add_result(metric, statistics.mean(result), reduce_type)
if cal_percentile:
self._process_percentile_result(metric, result, reduce_type)
return True
......
......@@ -156,7 +156,7 @@ class ORTInferenceBenchmark(MicroBenchmark):
else:
precision = self._args.precision.value
metric = '{}_{}_time'.format(precision, model)
if not self._process_numeric_result(metric, elapse_times):
if not self._process_numeric_result(metric, elapse_times, cal_percentile=True):
return False
logger.info(
......
......@@ -377,17 +377,21 @@ class ModelBenchmark(Benchmark):
if precision.value in precision_metric.keys():
precision = precision_metric[precision.value]
metric = '{}_{}_step_time'.format(precision, model_action)
reduce_type = ReduceType.MAX if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, step_times)
avg = statistics.mean(step_times)
self._result.add_result(metric, avg, reduce_type=ReduceType.MAX if model_action is ModelAction.TRAIN else None)
self._result.add_result(metric, statistics.mean(step_times), reduce_type=reduce_type)
if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, step_times, reduce_type=reduce_type)
# The unit of step time is millisecond, use it to calculate the throughput with the unit samples/sec.
millisecond_per_second = 1000
throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times]
metric = '{}_{}_throughput'.format(precision, model_action)
reduce_type = ReduceType.MIN if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, throughput)
avg = statistics.mean(throughput)
self._result.add_result(metric, avg, reduce_type=ReduceType.MIN if model_action is ModelAction.TRAIN else None)
self._result.add_result(metric, statistics.mean(throughput), reduce_type=reduce_type)
if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, throughput, reduce_type=reduce_type)
return True
......
......@@ -96,6 +96,17 @@ def test_micro_benchmark_base():
assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.raw_data['metric1'] == [[1, 2, 3, 4, 5, 6]])
benchmark._result._BenchmarkResult__result = dict()
benchmark._result._BenchmarkResult__raw_data = dict()
benchmark._process_numeric_result('metric1', [1, 3, 4, 2, 6, 5], cal_percentile=True)
assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.result['metric1_50'] == [3])
assert (benchmark.result['metric1_90'] == [5])
assert (benchmark.result['metric1_95'] == [6])
assert (benchmark.result['metric1_99'] == [6])
assert (benchmark.result['metric1_99.9'] == [6])
assert (benchmark.raw_data['metric1'] == [[1, 3, 4, 2, 6, 5]])
def test_micro_benchmark_with_invoke_base():
"""Test MicroBenchmarkWithInvoke."""
......
......@@ -252,9 +252,21 @@ def test_inference():
'"start_time": null, "end_time": null, "raw_data": {'
'"fp16_inference_step_time": [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], '
'"fp16_inference_throughput": [[8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0]]}, '
'"result": {"return_code": [0], '
'"fp16_inference_step_time": [4.0], "fp16_inference_throughput": [8000.0]}, '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, "fp16_inference_throughput": null}}'
'"result": {"return_code": [0], "fp16_inference_step_time": [4.0], '
'"fp16_inference_step_time_50": [4.0], "fp16_inference_step_time_90": [4.0], '
'"fp16_inference_step_time_95": [4.0], "fp16_inference_step_time_99": [4.0], '
'"fp16_inference_step_time_99.9": [4.0], '
'"fp16_inference_throughput": [8000.0], '
'"fp16_inference_throughput_50": [8000.0], "fp16_inference_throughput_90": [8000.0], '
'"fp16_inference_throughput_95": [8000.0], "fp16_inference_throughput_99": [8000.0], '
'"fp16_inference_throughput_99.9": [8000.0]}, '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, '
'"fp16_inference_step_time_50": null, "fp16_inference_step_time_90": null, '
'"fp16_inference_step_time_95": null, "fp16_inference_step_time_99": null, '
'"fp16_inference_step_time_99.9": null, "fp16_inference_throughput": null, '
'"fp16_inference_throughput_50": null, "fp16_inference_throughput_90": null, '
'"fp16_inference_throughput_95": null, "fp16_inference_throughput_99": null, '
'"fp16_inference_throughput_99.9": null}}'
)
assert (benchmark._preprocess())
assert (benchmark._ModelBenchmark__inference(Precision.FLOAT16))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment