Unverified Commit fd2bc9e0 authored by guoshzhao's avatar guoshzhao Committed by GitHub
Browse files

Benchmarks: Add Feature - Add percentile metrics for ort and pytorch inference benchmarks (#283)

**Description**
Add 50th, 90th, 95th, 99th, 99.9th latency metrics for ORT and pytorch inference benchmarks.
parent f7ffc545
...@@ -133,11 +133,15 @@ Inference performance of the torchvision models using ONNXRuntime. Currently the ...@@ -133,11 +133,15 @@ Inference performance of the torchvision models using ONNXRuntime. Currently the
> resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, shufflenet_v2_x0_5, shufflenet_v2_x1_0, > resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, shufflenet_v2_x0_5, shufflenet_v2_x1_0,
> squeezenet1_0, squeezenet1_1, vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19 > squeezenet1_0, squeezenet1_1, vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19_bn, vgg19
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|-----------------------------------------------|-----------|-----------------------------------------------------------| |---------------------------------------------------------|-----------|-----------------------------------------------------------------------------|
| ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. | | ort-inference/{precision}_{model}_time | time (ms) | The mean latency to execute one batch of inference. |
| ort-inference/{precision}_{model}_time_{percentile} | time (ms) | The {percentile}th percentile latency to execute one batch of inference. |
## Communication Benchmarks ## Communication Benchmarks
......
...@@ -12,57 +12,60 @@ id: model-benchmarks ...@@ -12,57 +12,60 @@ id: model-benchmarks
Run training or inference tasks with single or half precision for GPT models, Run training or inference tasks with single or half precision for GPT models,
including gpt2-small, gpt2-medium, gpt2-large and gpt2-xl. including gpt2-small, gpt2-medium, gpt2-large and gpt2-xl.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------| |-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. | | gpt_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. | | gpt_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. | | gpt_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| gpt_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. | | gpt_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. | | gpt_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. | | gpt_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. | | gpt_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| gpt_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. | | gpt_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `bert_models` ### `bert_models`
#### Introduction #### Introduction
Run training or inference tasks with single or half precision for BERT models, including bert-base and bert-large. Run training or inference tasks with single or half precision for BERT models, including bert-base and bert-large.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|-------------------------------------------------------------|------------------------|---------------------------------------------| |--------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. | | bert_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. | | bert_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. | | bert_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| bert_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. | | bert_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. | | bert_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. | | bert_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. | | bert_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| bert_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. | | bert_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `lstm_models` ### `lstm_models`
#### Introduction #### Introduction
Run training or inference tasks with single or half precision for one bidirectional LSTM model. Run training or inference tasks with single or half precision for one bidirectional LSTM model.
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|----------------------------------------------------|------------------------|---------------------------------------------| |-----------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | Train step time with single precision. | | lstm_models/pytorch-lstm/fp32_train_step_time | time (ms) | The average training step time with single precision. |
| lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. | | lstm_models/pytorch-lstm/fp32_train_throughput | throughput (samples/s) | The average training throughput with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_step_time | time (ms) | Inference step time with single precision. | | lstm_models/pytorch-lstm/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| lstm_models/pytorch-lstm/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. | | lstm_models/pytorch-lstm/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | Train step time with half precision. | | lstm_models/pytorch-lstm/fp16_train_step_time | time (ms) | The average training step time with half precision. |
| lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. | | lstm_models/pytorch-lstm/fp16_train_throughput | throughput (samples/s) | The average training throughput with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_step_time | time (ms) | Inference step time with half precision. | | lstm_models/pytorch-lstm/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| lstm_models/pytorch-lstm/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. | | lstm_models/pytorch-lstm/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
### `cnn_models` ### `cnn_models`
...@@ -80,16 +83,17 @@ Run training or inference tasks with single or half precision for CNN models lis ...@@ -80,16 +83,17 @@ Run training or inference tasks with single or half precision for CNN models lis
* shufflenet: shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0 * shufflenet: shufflenet_v2_x0_5, shufflenet_v2_x1_0, shufflenet_v2_x1_5, shufflenet_v2_x2_0
* squeezenet: squeezenet1_0, squeezenet1_1 * squeezenet: squeezenet1_0, squeezenet1_1
* others: alexnet, googlenet, inception_v3 * others: alexnet, googlenet, inception_v3
The supported percentiles are 50, 90, 95, 99, and 99.9.
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|------------------------------------------------------------|------------------------|---------------------------------------------| |-------------------------------------------------------------------------|------------------------|---------------------------------------------------------------------------|
| cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train step time with single precision. | | cnn_models/pytorch-${model_name}/fp32_train_step_time | time (ms) | Train average step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train throughput with single precision. | | cnn_models/pytorch-${model_name}/fp32_train_throughput | throughput (samples/s) | Train average throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_step_time | time (ms) | Inference step time with single precision. | | cnn_models/pytorch-${model_name}/fp32_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with single precision. |
| cnn_models/pytorch-${model_name}/fp32_inference_throughput | throughput (samples/s) | Inference throughput with single precision. | | cnn_models/pytorch-${model_name}/fp32_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with single precision. |
| cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train step time with half precision. | | cnn_models/pytorch-${model_name}/fp16_train_step_time | time (ms) | Train average step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train throughput with half precision. | | cnn_models/pytorch-${model_name}/fp16_train_throughput | throughput (samples/s) | Train average throughput with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_step_time | time (ms) | Inference step time with half precision. | | cnn_models/pytorch-${model_name}/fp16_inference_step_time_{percentile} | time (ms) | The {percentile}th percentile inference step time with half precision. |
| cnn_models/pytorch-${model_name}/fp16_inference_throughput | throughput (samples/s) | Inference throughput with half precision. | | cnn_models/pytorch-${model_name}/fp16_inference_throughput_{percentile} | throughput (samples/s) | The {percentile}th percentile inference throughput with half precision. |
...@@ -142,6 +142,7 @@ def run(self): ...@@ -142,6 +142,7 @@ def run(self):
'knack>=0.7.2', 'knack>=0.7.2',
'matplotlib>=3.0.0', 'matplotlib>=3.0.0',
'natsort>=7.1.1', 'natsort>=7.1.1',
'numpy>=1.19.2',
'openpyxl>=3.0.7', 'openpyxl>=3.0.7',
'omegaconf==2.0.6', 'omegaconf==2.0.6',
'pandas>=1.1.5', 'pandas>=1.1.5',
......
...@@ -9,6 +9,8 @@ ...@@ -9,6 +9,8 @@
from operator import attrgetter from operator import attrgetter
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
import numpy as np
from superbench.common.utils import logger from superbench.common.utils import logger
from superbench.benchmarks import BenchmarkType, ReturnCode from superbench.benchmarks import BenchmarkType, ReturnCode
from superbench.benchmarks.result import BenchmarkResult from superbench.benchmarks.result import BenchmarkResult
...@@ -246,6 +248,22 @@ def __check_raw_data(self): ...@@ -246,6 +248,22 @@ def __check_raw_data(self):
return True return True
def _process_percentile_result(self, metric, result, reduce_type=None):
"""Function to process the percentile results.
Args:
metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function.
"""
if len(result) > 0:
percentile_list = ['50', '90', '95', '99', '99.9']
for percentile in percentile_list:
self._result.add_result(
'{}_{}'.format(metric, percentile),
np.percentile(result, float(percentile), interpolation='nearest'), reduce_type
)
def print_env_info(self): def print_env_info(self):
"""Print environments or dependencies information.""" """Print environments or dependencies information."""
# TODO: will implement it when add real benchmarks in the future. # TODO: will implement it when add real benchmarks in the future.
......
...@@ -49,13 +49,14 @@ def _benchmark(self): ...@@ -49,13 +49,14 @@ def _benchmark(self):
""" """
pass pass
def _process_numeric_result(self, metric, result, reduce_type=None): def _process_numeric_result(self, metric, result, reduce_type=None, cal_percentile=False):
"""Function to save the numerical results. """Function to save the numerical results.
Args: Args:
metric (str): metric name which is the key. metric (str): metric name which is the key.
result (List[numbers.Number]): numerical result. result (List[numbers.Number]): numerical result.
reduce_type (ReduceType): The type of reduce function. reduce_type (ReduceType): The type of reduce function.
cal_percentile (bool): Whether to calculate the percentile results.
Return: Return:
True if result list is not empty. True if result list is not empty.
...@@ -70,6 +71,8 @@ def _process_numeric_result(self, metric, result, reduce_type=None): ...@@ -70,6 +71,8 @@ def _process_numeric_result(self, metric, result, reduce_type=None):
self._result.add_raw_data(metric, result) self._result.add_raw_data(metric, result)
self._result.add_result(metric, statistics.mean(result), reduce_type) self._result.add_result(metric, statistics.mean(result), reduce_type)
if cal_percentile:
self._process_percentile_result(metric, result, reduce_type)
return True return True
......
...@@ -156,7 +156,7 @@ def _benchmark(self): ...@@ -156,7 +156,7 @@ def _benchmark(self):
else: else:
precision = self._args.precision.value precision = self._args.precision.value
metric = '{}_{}_time'.format(precision, model) metric = '{}_{}_time'.format(precision, model)
if not self._process_numeric_result(metric, elapse_times): if not self._process_numeric_result(metric, elapse_times, cal_percentile=True):
return False return False
logger.info( logger.info(
......
...@@ -377,17 +377,21 @@ def __process_model_result(self, model_action, precision, step_times): ...@@ -377,17 +377,21 @@ def __process_model_result(self, model_action, precision, step_times):
if precision.value in precision_metric.keys(): if precision.value in precision_metric.keys():
precision = precision_metric[precision.value] precision = precision_metric[precision.value]
metric = '{}_{}_step_time'.format(precision, model_action) metric = '{}_{}_step_time'.format(precision, model_action)
reduce_type = ReduceType.MAX if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, step_times) self._result.add_raw_data(metric, step_times)
avg = statistics.mean(step_times) self._result.add_result(metric, statistics.mean(step_times), reduce_type=reduce_type)
self._result.add_result(metric, avg, reduce_type=ReduceType.MAX if model_action is ModelAction.TRAIN else None) if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, step_times, reduce_type=reduce_type)
# The unit of step time is millisecond, use it to calculate the throughput with the unit samples/sec. # The unit of step time is millisecond, use it to calculate the throughput with the unit samples/sec.
millisecond_per_second = 1000 millisecond_per_second = 1000
throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times] throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times]
metric = '{}_{}_throughput'.format(precision, model_action) metric = '{}_{}_throughput'.format(precision, model_action)
reduce_type = ReduceType.MIN if model_action is ModelAction.TRAIN else None
self._result.add_raw_data(metric, throughput) self._result.add_raw_data(metric, throughput)
avg = statistics.mean(throughput) self._result.add_result(metric, statistics.mean(throughput), reduce_type=reduce_type)
self._result.add_result(metric, avg, reduce_type=ReduceType.MIN if model_action is ModelAction.TRAIN else None) if model_action == ModelAction.INFERENCE:
self._process_percentile_result(metric, throughput, reduce_type=reduce_type)
return True return True
......
...@@ -96,6 +96,17 @@ def test_micro_benchmark_base(): ...@@ -96,6 +96,17 @@ def test_micro_benchmark_base():
assert (benchmark.result['metric1'] == [3.5]) assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.raw_data['metric1'] == [[1, 2, 3, 4, 5, 6]]) assert (benchmark.raw_data['metric1'] == [[1, 2, 3, 4, 5, 6]])
benchmark._result._BenchmarkResult__result = dict()
benchmark._result._BenchmarkResult__raw_data = dict()
benchmark._process_numeric_result('metric1', [1, 3, 4, 2, 6, 5], cal_percentile=True)
assert (benchmark.result['metric1'] == [3.5])
assert (benchmark.result['metric1_50'] == [3])
assert (benchmark.result['metric1_90'] == [5])
assert (benchmark.result['metric1_95'] == [6])
assert (benchmark.result['metric1_99'] == [6])
assert (benchmark.result['metric1_99.9'] == [6])
assert (benchmark.raw_data['metric1'] == [[1, 3, 4, 2, 6, 5]])
def test_micro_benchmark_with_invoke_base(): def test_micro_benchmark_with_invoke_base():
"""Test MicroBenchmarkWithInvoke.""" """Test MicroBenchmarkWithInvoke."""
......
...@@ -252,9 +252,21 @@ def test_inference(): ...@@ -252,9 +252,21 @@ def test_inference():
'"start_time": null, "end_time": null, "raw_data": {' '"start_time": null, "end_time": null, "raw_data": {'
'"fp16_inference_step_time": [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], ' '"fp16_inference_step_time": [[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]], '
'"fp16_inference_throughput": [[8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0]]}, ' '"fp16_inference_throughput": [[8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0, 8000.0]]}, '
'"result": {"return_code": [0], ' '"result": {"return_code": [0], "fp16_inference_step_time": [4.0], '
'"fp16_inference_step_time": [4.0], "fp16_inference_throughput": [8000.0]}, ' '"fp16_inference_step_time_50": [4.0], "fp16_inference_step_time_90": [4.0], '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, "fp16_inference_throughput": null}}' '"fp16_inference_step_time_95": [4.0], "fp16_inference_step_time_99": [4.0], '
'"fp16_inference_step_time_99.9": [4.0], '
'"fp16_inference_throughput": [8000.0], '
'"fp16_inference_throughput_50": [8000.0], "fp16_inference_throughput_90": [8000.0], '
'"fp16_inference_throughput_95": [8000.0], "fp16_inference_throughput_99": [8000.0], '
'"fp16_inference_throughput_99.9": [8000.0]}, '
'"reduce_op": {"return_code": null, "fp16_inference_step_time": null, '
'"fp16_inference_step_time_50": null, "fp16_inference_step_time_90": null, '
'"fp16_inference_step_time_95": null, "fp16_inference_step_time_99": null, '
'"fp16_inference_step_time_99.9": null, "fp16_inference_throughput": null, '
'"fp16_inference_throughput_50": null, "fp16_inference_throughput_90": null, '
'"fp16_inference_throughput_95": null, "fp16_inference_throughput_99": null, '
'"fp16_inference_throughput_99.9": null}}'
) )
assert (benchmark._preprocess()) assert (benchmark._preprocess())
assert (benchmark._ModelBenchmark__inference(Precision.FLOAT16)) assert (benchmark._ModelBenchmark__inference(Precision.FLOAT16))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment