Unverified Commit fd2bc9e0 authored by guoshzhao's avatar guoshzhao Committed by GitHub
Browse files

Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283)

**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
parent f7ffc545
......@@ -133,11 +133,15 @@ Inference performance of the torchvision models using ONNXRuntime. Currently the
> resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, shufflenet_v2_x0_5, shufflenet_v2_x1_0,
> squeezenet1_0, squeezenet1_1, vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|-----------------------------------------------|-----------|-----------------------------------------------------------|
| ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. |
| Name | Unit | Description |
|---------------------------------------------------------|-----------|-----------------------------------------------------------------------------|
| ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. |
| ort-inference/{precision}_{model}_time_{percentile} | time (ms) | The {percentile}th percentile latency to execute one batch of inference. |
## Communication Benchmarks
......
......@@ -12,57 +12,60 @@ id: model-benchmarks
Run training or inference tasks with single or half precision for GPT models,
including gpt2-small, gpt2-medium, gpt2-large and gpt2-xl.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------|
| gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `bert_models`
#### Introduction
Run training or inference tasks with single or half precision for BERT models, including bert-base and bert-large.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|-------------------------------------------------------------|------------------------|---------------------------------------------|
| bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|--------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `lstm_models`
#### Introduction
Run training or inference tasks with single or half precision for one bidirectional LSTM model.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|----------------------------------------------------|------------------------|---------------------------------------------|
| lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | Train step time with single precision. |
| lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | Train step time with half precision. |
| lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-----------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `cnn_models`
......@@ -80,16 +83,17 @@ Run training or inference tasks with single or half precision for CNN models lis
* shufflenet: shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
* squeezenet: squeezenet1_0, squeezenet1_1
* others: alexnet, googlenet, inception_v3
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics
| Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------|
| cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. |
| Name | Unit | Description |
|-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train average step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train average throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train average step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train average throughput with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
......@@ -142,6 +142,7 @@ def run(self):
'knack>=0.7.2',
'matplotlib>=3.0.0',
'natsort>=7.1.1',
'numpy>=1.19.2',
'openpyxl>=3.0.7',
'omegaconf==2.0.6',
'pandas>=1.1.5',
......
......@@ -9,6 +9,8 @@
from operator import attrgetter
from abc import ABC, abstractmethod
import numpy as np
from superbench.common.utils import logger
from superbench.benchmarks import BenchmarkType, ReturnCode
from superbench.benchmarks.result import BenchmarkResult
......@@ -246,6 +248,22 @@ def __check_raw_data(self):
return True
def _process_percentile_result(self, metric, result, reduce_type=None):
"""Function to process the percentile results.
Args:
metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function.
"""
if len(result) > 0:
percentile_list = ['50', '90', '95', '99', '99.9']
for percentile in percentile_list:
self._result.add_result(
'{}_{}'.format(metric, percentile),
np.percentile(result, float(percentile), interpolation='nearest'), reduce_type
)
def print_env_info(self):
"""Print environments or dependencies information."""
# TODO: will implement it when add real benchmarks in the future.
......
......@@ -49,13 +49,14 @@ def _benchmark(self):
"""
pass
def _process_numeric_result(self, metric, result, reduce_type=None):
def _process_numeric_result(self, metric, result, reduce_type=None, cal_percentile=False):
"""Function to save the numerical results.
Args:
metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function.
cal_percentile (bool): Whether to calculate the percentile results.
Return:
True if result list is not empty.
......@@ -70,6 +71,8 @@ def _process_numeric_result(self, metric, result, reduce_type=None):
self._result.add_raw_data(metric, result)
self._result.add_result(metric, statistics.mean(result), reduce_type)
if cal_percentile:
self._process_percentile_result(metric, result, reduce_type)
return True
......
......@@ -156,7 +156,7 @@ def _benchmark(self):
else:
precision = self._args.precision.value
metric = '{}_{}_time'.format(precision, model)
if not self._process_numeric_result(metric, elapse_times):
if not self._process_numeric_result(metric, elapse_times, cal_percentile=True):
return False
logger.info(
......
......@@ -377,17 +377,21 @@ def __process_model_result(self, model_action, precision, step_times):
if precision.value in precision_metric.keys():
precision = precision_metric[precision.value]
metric = '{}_{}_step_time'.format(precision, model_action)
reduce_type = ReduceType.MAX if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, step_times)
avg = statistics.mean(step_times)
self._result.add_result(metric, avg, reduce_type=ReduceType.MAX if model_action is ModelAction.TRAIN else None)
self._result.add_result(metric, statistics.mean(step_times), reduce_type=reduce_type)
if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, step_times, reduce_type=reduce_type)
# The unit of step time is millisecond, use it to calculate the throughput with the unit samples/sec.
millisecond_per_second = 1000
throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times]
metric = '{}_{}_throughput'.format(precision, model_action)
reduce_type = ReduceType.MIN if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, throughput)
avg = statistics.mean(throughput)
self._result.add_result(metric, avg, reduce_type=ReduceType.MIN if model_action is ModelAction.TRAIN else None)
self._result.add_result(metric, statistics.mean(throughput), reduce_type=reduce_type)
if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, throughput, reduce_type=reduce_type)
return True
......
......@@ -96,6 +96,17 @@ def test_micro_benchmark_base():
assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.raw_data['metric1'] == [[1, 2, 3, 4, 5, 6]])
benchmark._result._BenchmarkResult__result = dict()
benchmark._result._BenchmarkResult__raw_data = dict()
benchmark._process_numeric_result('metric1', [1, 3, 4, 2, 6, 5], cal_percentile=True)
assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.result['metric1_50'] == [3])
assert (benchmark.result['metric1_90'] == [5])
assert (benchmark.result['metric1_95'] == [6])
assert (benchmark.result['metric1_99'] == [6])
assert (benchmark.result['metric1_99.9'] == [6])
assert (benchmark.raw_data['metric1'] == [[1, 3, 4, 2, 6, 5]])
def test_micro_benchmark_with_invoke_base():
"""Test MicroBenchmarkWithInvoke."""
......
......@@ -252,9 +252,21 @@ def test_inference():
'"start_time": null, "end_time": null, "raw_data": {'
'"fp16_inference_step_time": [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], '
'"fp16_inference_throughput": [[8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0]]}, '
'"result": {"return_code": [0], '
'"fp16_inference_step_time": [4.0], "fp16_inference_throughput": [8000.0]}, '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, "fp16_inference_throughput": null}}'
'"result": {"return_code": [0], "fp16_inference_step_time": [4.0], '
'"fp16_inference_step_time_50": [4.0], "fp16_inference_step_time_90": [4.0], '
'"fp16_inference_step_time_95": [4.0], "fp16_inference_step_time_99": [4.0], '
'"fp16_inference_step_time_99.9": [4.0], '
'"fp16_inference_throughput": [8000.0], '
'"fp16_inference_throughput_50": [8000.0], "fp16_inference_throughput_90": [8000.0], '
'"fp16_inference_throughput_95": [8000.0], "fp16_inference_throughput_99": [8000.0], '
'"fp16_inference_throughput_99.9": [8000.0]}, '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, '
'"fp16_inference_step_time_50": null, "fp16_inference_step_time_90": null, '
'"fp16_inference_step_time_95": null, "fp16_inference_step_time_99": null, '
'"fp16_inference_step_time_99.9": null, "fp16_inference_throughput": null, '
'"fp16_inference_throughput_50": null, "fp16_inference_throughput_90": null, '
'"fp16_inference_throughput_95": null, "fp16_inference_throughput_99": null, '
'"fp16_inference_throughput_99.9": null}}'
)
assert (benchmark._preprocess())
assert (benchmark._ModelBenchmark__inference(Precision.FLOAT16))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment