Unverified Commit 6681c720 authored by Yifan Xiong's avatar Yifan Xiong Committed by GitHub
Browse files

Release - SuperBench v0.5.0 (#350)



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
parent 712eafc3
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
__SuperBench__ is a validation and profiling tool for AI infrastructure. __SuperBench__ is a validation and profiling tool for AI infrastructure.
📢 [v0.4.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.4.0) has been released! 📢 [v0.5.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.5.0) has been released!
## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._ ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
......
...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it. ...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
:::note Note :::note Note
You should checkout corresponding tag to use release version, for example, You should checkout corresponding tag to use release version, for example,
`git clone -b v0.4.0 https://github.com/microsoft/superbenchmark` `git clone -b v0.5.0 https://github.com/microsoft/superbenchmark`
::: :::
```bash ```bash
......
...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password] ...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
:::note Note :::note Note
You should deploy corresponding Docker image to use release version, for example, You should deploy corresponding Docker image to use release version, for example,
`sb deploy -f local.ini -i superbench/superbench:v0.4.0-cuda11.1.1` `sb deploy -f local.ini -i superbench/superbench:v0.5.0-cuda11.1.1`
You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone. You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
......
...@@ -70,7 +70,7 @@ superbench: ...@@ -70,7 +70,7 @@ superbench:
<TabItem value='example'> <TabItem value='example'>
```yaml ```yaml
version: v0.4 version: v0.5
superbench: superbench:
enable: benchmark_1 enable: benchmark_1
monitor: monitor:
......
...@@ -28,7 +28,8 @@ available tags are listed below for all stable versions. ...@@ -28,7 +28,8 @@ available tags are listed below for all stable versions.
<TabItem value='cuda'> <TabItem value='cuda'>
| Tag | Description | | Tag | Description |
| ----------------- | ---------------------------------- | |-------------------|------------------------------------|
| v0.5.0-cuda11.1.1 | SuperBench v0.5.0 with CUDA 11.1.1 |
| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 | | v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 | | v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 | | v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
...@@ -38,7 +39,11 @@ available tags are listed below for all stable versions. ...@@ -38,7 +39,11 @@ available tags are listed below for all stable versions.
<TabItem value='rocm'> <TabItem value='rocm'>
| Tag | Description | | Tag | Description |
| --------------------------- | ---------------------------------------------- | |-------------------------------|--------------------------------------------------|
| v0.5.0-rocm5.0.1-pytorch1.9.0 | SuperBench v0.5.0 with ROCm 5.0.1, PyTorch 1.9.0 |
| v0.5.0-rocm5.0-pytorch1.9.0 | SuperBench v0.5.0 with ROCm 5.0, PyTorch 1.9.0 |
| v0.5.0-rocm4.2-pytorch1.7.0 | SuperBench v0.5.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.5.0-rocm4.0-pytorch1.7.0 | SuperBench v0.5.0 with ROCm 4.0, PyTorch 1.7.0 |
| v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 | | v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 | | v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 |
| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 | | v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
......
...@@ -65,7 +65,7 @@ superbench: ...@@ -65,7 +65,7 @@ superbench:
example: example:
```yaml ```yaml
# SuperBench rules # SuperBench rules
version: v0.4 version: v0.5
superbench: superbench:
rules: rules:
failure-rule: failure-rule:
......
...@@ -20,14 +20,12 @@ This tool is to generate a readable summary report based on the raw benchmark re ...@@ -20,14 +20,12 @@ This tool is to generate a readable summary report based on the raw benchmark re
sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir} sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir}
``` ```
4. Find the output result file named 'results_summary.md' under ${output_dir}. 4. Find the output result file named 'results-summary.md' under ${output_dir}.
## Input ## Input
The input includes 2 files: The input includes 2 files:
- **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner. - **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner.
:::tip Tips :::tip Tips
...@@ -60,7 +58,7 @@ superbench: ...@@ -60,7 +58,7 @@ superbench:
```yaml title="Example" ```yaml title="Example"
# SuperBench rules # SuperBench rules
version: v0.4 version: v0.5
superbench: superbench:
rules: rules:
kernel_launch: kernel_launch:
...@@ -122,3 +120,8 @@ The following illustrates all statistical functions: ...@@ -122,3 +120,8 @@ The following illustrates all statistical functions:
- `min` - `min`
- `p${value}`: ${value} can be 1-99. For example, p50, p90, etc. - `p${value}`: ${value} can be 1-99. For example, p50, p90, etc.
- `std` - `std`
## Output
We support different output formats for result sumamry including markdown, html, etc.
The output includes the metrics grouped by category and their values obtained by applying statistical methods to all raw results.
...@@ -173,7 +173,7 @@ def run(self): ...@@ -173,7 +173,7 @@ def run(self):
'nvidia': ['py3nvml>=0.2.6'], 'nvidia': ['py3nvml>=0.2.6'],
'ort': [ 'ort': [
'onnx>=1.10.2', 'onnx>=1.10.2',
'onnxruntime-gpu>=1.9.0', 'onnxruntime-gpu==1.10.0',
], ],
'torch': [ 'torch': [
'torch>=1.7.0a0', 'torch>=1.7.0a0',
......
...@@ -6,5 +6,5 @@ ...@@ -6,5 +6,5 @@
Provide hardware and software benchmarks for AI systems. Provide hardware and software benchmarks for AI systems.
""" """
__version__ = '0.4.0' __version__ = '0.5.0'
__author__ = 'Microsoft' __author__ = 'Microsoft'
...@@ -84,19 +84,23 @@ def _parse_rules(self, rules): ...@@ -84,19 +84,23 @@ def _parse_rules(self, rules):
logger.error('ResultSummary: parse rules failed - {}'.format(str(e))) logger.error('ResultSummary: parse rules failed - {}'.format(str(e)))
return False return False
def _format_summary_of_rule(self, category, summary_df_of_rule): def _format_summary_of_rule(self, category, summary_df_of_rule, statistics):
"""Format summary_df of a rule info list of lines. """Format summary_df of a rule info list of lines.
Args: Args:
category (str): category in the rule category (str): category in the rule
summary_df_of_rule ([type]): summary df of a rule, the columns are metrics, the index are statistics summary_df_of_rule ([type]): summary df of a rule, the columns are metrics, the index are statistics
statistics (list): statistics in the rule
Returns: Returns:
list: list of summary lines like [category, metric, statistic, value] list: list of summary lines like [category, metric, statistic, value]
""" """
summary = [] summary = []
metrics = summary_df_of_rule.columns metrics = summary_df_of_rule.columns
if metrics.empty is True:
for statistic in statistics:
summary.append([category, '', statistic, ''])
for metric in metrics: for metric in metrics:
for statistic in summary_df_of_rule.index: for statistic in statistics:
summary.append([category, metric, statistic, summary_df_of_rule.loc[statistic, metric]]) summary.append([category, metric, statistic, summary_df_of_rule.loc[statistic, metric]])
return summary return summary
...@@ -132,6 +136,10 @@ def _generate_summary(self, round): ...@@ -132,6 +136,10 @@ def _generate_summary(self, round):
metrics = list(self._sb_rules[rule]['metrics'].keys()) metrics = list(self._sb_rules[rule]['metrics'].keys())
category = self._sb_rules[rule]['categories'] category = self._sb_rules[rule]['categories']
data_df_of_rule = self._raw_data_df[metrics] data_df_of_rule = self._raw_data_df[metrics]
statistics = self._sb_rules[rule]['statistics']
summary_df_of_rule = pd.DataFrame()
# skip metrics aggregation and statistics calculation fot the rule with no matched metrics
if len(metrics) != 0:
if self._sb_rules[rule]['aggregate']: if self._sb_rules[rule]['aggregate']:
# if aggregate is True, aggregate in ranks # if aggregate is True, aggregate in ranks
if self._sb_rules[rule]['aggregate'] is True: if self._sb_rules[rule]['aggregate'] is True:
...@@ -139,7 +147,6 @@ def _generate_summary(self, round): ...@@ -139,7 +147,6 @@ def _generate_summary(self, round):
# if aggregate is not empty and is a pattern in regex, aggregate according to pattern # if aggregate is not empty and is a pattern in regex, aggregate according to pattern
else: else:
data_df_of_rule = data_analysis.aggregate(data_df_of_rule, self._sb_rules[rule]['aggregate']) data_df_of_rule = data_analysis.aggregate(data_df_of_rule, self._sb_rules[rule]['aggregate'])
statistics = self._sb_rules[rule]['statistics']
summary_df_of_rule = pd.DataFrame(columns=sorted(data_df_of_rule.columns)) summary_df_of_rule = pd.DataFrame(columns=sorted(data_df_of_rule.columns))
for statistic_name in statistics: for statistic_name in statistics:
# get SummaryOp and calculate statistics # get SummaryOp and calculate statistics
...@@ -157,7 +164,7 @@ def _generate_summary(self, round): ...@@ -157,7 +164,7 @@ def _generate_summary(self, round):
summary_df_of_rule, round, list(summary_df_of_rule.columns) summary_df_of_rule, round, list(summary_df_of_rule.columns)
) )
# format summary_df of a rule to list of lines # format summary_df of a rule to list of lines
summary_lines_of_rule = self._format_summary_of_rule(category, summary_df_of_rule) summary_lines_of_rule = self._format_summary_of_rule(category, summary_df_of_rule, statistics)
summary[category] = summary_lines_of_rule summary[category] = summary_lines_of_rule
return summary return summary
...@@ -233,15 +240,15 @@ def run(self, raw_data_file, rule_file, output_dir, output_format, round=2): ...@@ -233,15 +240,15 @@ def run(self, raw_data_file, rule_file, output_dir, output_format, round=2):
# output result summary to file # output result summary to file
output_path = '' output_path = ''
if output_format == 'excel': if output_format == 'excel':
output_path = str(Path(output_dir) / 'results_summary.xlsx') output_path = str(Path(output_dir) / 'results-summary.xlsx')
summary_df = self._merge_summary(summary) summary_df = self._merge_summary(summary)
self.output_summary_in_excel(self._raw_data_df, summary_df, output_path) self.output_summary_in_excel(self._raw_data_df, summary_df, output_path)
elif output_format == 'md': elif output_format == 'md':
output_path = str(Path(output_dir) / 'results_summary.md') output_path = str(Path(output_dir) / 'results-summary.md')
lines = self.generate_md_lines(summary) lines = self.generate_md_lines(summary)
file_handler.output_lines_in_md(lines, output_path) file_handler.output_lines_in_md(lines, output_path)
elif output_format == 'html': elif output_format == 'html':
output_path = str(Path(output_dir) / 'results_summary.html') output_path = str(Path(output_dir) / 'results-summary.html')
lines = self.generate_md_lines(summary) lines = self.generate_md_lines(summary)
file_handler.output_lines_in_html(lines, output_path) file_handler.output_lines_in_html(lines, output_path)
else: else:
......
...@@ -32,6 +32,9 @@ def _get_metrics_by_benchmarks(self, metrics_list): ...@@ -32,6 +32,9 @@ def _get_metrics_by_benchmarks(self, metrics_list):
logger.warning('RuleBase: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric)) logger.warning('RuleBase: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric))
else: else:
benchmark = metric.split('/')[0] benchmark = metric.split('/')[0]
# support annotations in benchmark naming
if ':' in benchmark:
benchmark = metric.split(':')[0]
if benchmark not in benchmarks_metrics: if benchmark not in benchmarks_metrics:
benchmarks_metrics[benchmark] = set() benchmarks_metrics[benchmark] = set()
benchmarks_metrics[benchmark].add(metric) benchmarks_metrics[benchmark].add(metric)
......
...@@ -35,6 +35,7 @@ def __init__(self, name, parameters=''): ...@@ -35,6 +35,7 @@ def __init__(self, name, parameters=''):
self._benchmark_type = BenchmarkType.MODEL self._benchmark_type = BenchmarkType.MODEL
self._world_size = 1 self._world_size = 1
self._local_rank = None self._local_rank = None
self._global_rank = None
self._dataset = None self._dataset = None
self._dataloader = None self._dataloader = None
self._model = None self._model = None
...@@ -242,7 +243,8 @@ def __train(self, precision): ...@@ -242,7 +243,8 @@ def __train(self, precision):
# The unit of step time should be millisecond. # The unit of step time should be millisecond.
step_times = self._train_step(precision) step_times = self._train_step(precision)
if not self.__process_model_result(ModelAction.TRAIN, precision, step_times): step_times = self.__process_model_result(ModelAction.TRAIN, precision, step_times)
if not step_times:
self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT) self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT)
return False return False
...@@ -266,7 +268,8 @@ def __inference(self, precision): ...@@ -266,7 +268,8 @@ def __inference(self, precision):
self._create_model(precision) self._create_model(precision)
# The unit of step time should be millisecond. # The unit of step time should be millisecond.
step_times = self._inference_step(precision) step_times = self._inference_step(precision)
if not self.__process_model_result(ModelAction.INFERENCE, precision, step_times): step_times = self.__process_model_result(ModelAction.INFERENCE, precision, step_times)
if not step_times:
self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT) self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT)
return False return False
...@@ -369,9 +372,9 @@ def _sync_result(self, result): ...@@ -369,9 +372,9 @@ def _sync_result(self, result):
result (list): The result data to sync. result (list): The result data to sync.
Return: Return:
True if reduce result data successfully. Result if reduce result data successfully, otherwise None.
""" """
return True return result
def __process_model_result(self, model_action, precision, step_times): def __process_model_result(self, model_action, precision, step_times):
"""Function to process raw results and save the summarized results. """Function to process raw results and save the summarized results.
...@@ -382,7 +385,7 @@ def __process_model_result(self, model_action, precision, step_times): ...@@ -382,7 +385,7 @@ def __process_model_result(self, model_action, precision, step_times):
step_times (list): The step time list of every training/inference step, unit is millisecond. step_times (list): The step time list of every training/inference step, unit is millisecond.
Return: Return:
True if step_times list is not empty. step_times if step_times list is not empty, otherwise None.
""" """
if len(step_times) == 0: if len(step_times) == 0:
logger.error( logger.error(
...@@ -390,7 +393,7 @@ def __process_model_result(self, model_action, precision, step_times): ...@@ -390,7 +393,7 @@ def __process_model_result(self, model_action, precision, step_times):
self._curr_run_index, self._name, model_action, precision self._curr_run_index, self._name, model_action, precision
) )
) )
return False return None
precision_metric = {'float16': 'fp16', 'float32': 'fp32', 'float64': 'fp64', 'bfloat16': 'bf16'} precision_metric = {'float16': 'fp16', 'float32': 'fp32', 'float64': 'fp64', 'bfloat16': 'bf16'}
if precision.value in precision_metric.keys(): if precision.value in precision_metric.keys():
...@@ -404,9 +407,10 @@ def __process_model_result(self, model_action, precision, step_times): ...@@ -404,9 +407,10 @@ def __process_model_result(self, model_action, precision, step_times):
self._result.add_raw_data(metric_t, throughput, self._args.log_raw_data) self._result.add_raw_data(metric_t, throughput, self._args.log_raw_data)
if model_action == ModelAction.TRAIN: if model_action == ModelAction.TRAIN:
if not self._sync_result(step_times): step_times = self._sync_result(step_times)
return False if not step_times:
if self._local_rank is None or self._local_rank == 0: return None
if self._local_rank is None or self._global_rank == 0:
self._result.add_result(metric_s, statistics.mean(step_times)) self._result.add_result(metric_s, statistics.mean(step_times))
throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times] throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times]
self._result.add_result(metric_t, statistics.mean(throughput)) self._result.add_result(metric_t, statistics.mean(throughput))
...@@ -416,7 +420,7 @@ def __process_model_result(self, model_action, precision, step_times): ...@@ -416,7 +420,7 @@ def __process_model_result(self, model_action, precision, step_times):
self._process_percentile_result(metric_s, step_times) self._process_percentile_result(metric_s, step_times)
self._process_percentile_result(metric_t, throughput) self._process_percentile_result(metric_t, throughput)
return True return step_times
@abstractmethod @abstractmethod
def _cal_params_count(self): def _cal_params_count(self):
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
import os import os
from datetime import timedelta from datetime import timedelta
import time
import torch import torch
import transformers import transformers
...@@ -60,6 +61,7 @@ def _init_distributed_setting(self): ...@@ -60,6 +61,7 @@ def _init_distributed_setting(self):
hvd.init() hvd.init()
self._world_size = int(hvd.size()) self._world_size = int(hvd.size())
self._local_rank = int(hvd.local_rank()) self._local_rank = int(hvd.local_rank())
self._global_rank = int(hvd.rank())
elif self._args.distributed_impl == DistributedImpl.DDP: elif self._args.distributed_impl == DistributedImpl.DDP:
if os.environ.get('WORLD_SIZE') is None or os.environ.get('LOCAL_RANK') is None: if os.environ.get('WORLD_SIZE') is None or os.environ.get('LOCAL_RANK') is None:
logger.error( logger.error(
...@@ -70,17 +72,17 @@ def _init_distributed_setting(self): ...@@ -70,17 +72,17 @@ def _init_distributed_setting(self):
# torch >= 1.9.0a0 torch.distributed.elastic is used by default # torch >= 1.9.0a0 torch.distributed.elastic is used by default
port = int(os.environ['MASTER_PORT']) + 1 port = int(os.environ['MASTER_PORT']) + 1
addr = os.environ['MASTER_ADDR'] addr = os.environ['MASTER_ADDR']
global_rank = int(os.environ['RANK']) self._global_rank = int(os.environ['RANK'])
self._local_rank = int(os.environ['LOCAL_RANK']) self._local_rank = int(os.environ['LOCAL_RANK'])
self._world_size = int(os.environ['WORLD_SIZE']) self._world_size = int(os.environ['WORLD_SIZE'])
logger.debug('ip:{},port:{},rank:{},world:{}'.format(addr, port, global_rank, self._world_size)) logger.debug('ip:{},port:{},rank:{},world:{}'.format(addr, port, self._global_rank, self._world_size))
store = PrefixStore( store = PrefixStore(
self._name, TCPStore(addr, port, self._world_size, global_rank == 0, timedelta(seconds=300)) self._name, TCPStore(addr, port, self._world_size, self._global_rank == 0, timedelta(seconds=300))
) )
torch.distributed.init_process_group( torch.distributed.init_process_group(
backend=self._args.distributed_backend.value, backend=self._args.distributed_backend.value,
timeout=timedelta(seconds=300), timeout=timedelta(seconds=300),
rank=global_rank, rank=self._global_rank,
world_size=self._world_size, world_size=self._world_size,
store=store store=store
) )
...@@ -188,6 +190,33 @@ def _create_optimizer(self): ...@@ -188,6 +190,33 @@ def _create_optimizer(self):
return True return True
def _is_finished(self, curr_step, curr_time, check_frequency=100):
"""Judge whether the benchmarking should be stopped early or not.
Args:
curr_step (int): the current benchmarking step.
curr_time (float): the current time in seconds got from time.time().
check_frequency (int): the frequency (step numbers) to check if benchmark should be stopped.
Return:
True if the benchmarking should be stopped.
"""
is_finished = int(super()._is_finished(curr_step, curr_time))
if self._args.duration > 0:
if curr_step % check_frequency == 0:
# sync is_finished in distributed mode
# if any rank is_finished is True, all ranks should be finished
if self._args.distributed_impl == DistributedImpl.DDP:
tensor = torch.IntTensor([is_finished])
if self._args.distributed_backend == DistributedBackend.NCCL:
tensor = tensor.cuda()
torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.MAX)
is_finished = tensor.tolist()[0]
else:
is_finished = 0
return (is_finished == 1)
def _sync_result(self, result): def _sync_result(self, result):
"""Function to reduce the result to rank 0. """Function to reduce the result to rank 0.
...@@ -195,10 +224,11 @@ def _sync_result(self, result): ...@@ -195,10 +224,11 @@ def _sync_result(self, result):
result (list): The result data to sync. result (list): The result data to sync.
Return: Return:
True if reduce result data successfully. Result if reduce result data successfully, otherwise None.
""" """
if not super()._sync_result(result): result = super()._sync_result(result)
return False if not result:
return None
try: try:
if self._args.distributed_impl == DistributedImpl.DDP: if self._args.distributed_impl == DistributedImpl.DDP:
...@@ -206,7 +236,7 @@ def _sync_result(self, result): ...@@ -206,7 +236,7 @@ def _sync_result(self, result):
tensor = torch.as_tensor(result).cuda() tensor = torch.as_tensor(result).cuda()
else: else:
tensor = torch.as_tensor(result) tensor = torch.as_tensor(result)
torch.distributed.reduce(tensor, 0, op=torch.distributed.ReduceOp.MAX) torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.MAX)
result = tensor.tolist() result = tensor.tolist()
except BaseException as e: except BaseException as e:
logger.error( logger.error(
...@@ -214,9 +244,9 @@ def _sync_result(self, result): ...@@ -214,9 +244,9 @@ def _sync_result(self, result):
self._name, self._args.distributed_impl, str(e) self._name, self._args.distributed_impl, str(e)
) )
) )
return False return None
return True return result
def _postprocess(self): def _postprocess(self):
"""Postprocess/cleanup operations after the benchmarking. """Postprocess/cleanup operations after the benchmarking.
...@@ -257,3 +287,16 @@ def _cal_params_count(self): ...@@ -257,3 +287,16 @@ def _cal_params_count(self):
The count of trainable parameters. The count of trainable parameters.
""" """
return sum(p.numel() for p in self._model.parameters() if p.requires_grad) return sum(p.numel() for p in self._model.parameters() if p.requires_grad)
def _timer(self):
"""Returns the current time which ensures all previous CUDA events have been finished.
If there is no GPU present, this defaults to `time.time()`; otherwise it will
synchronize CUDA before measuring the time.
Returns:
Current time in second.
"""
if self._gpu_available:
torch.cuda.synchronize()
return time.time()
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch BERT model.""" """Module of the Pytorch BERT model."""
import time
import torch import torch
from transformers import BertModel, BertConfig from transformers import BertModel, BertConfig
...@@ -137,9 +135,10 @@ def _train_step(self, precision): ...@@ -137,9 +135,10 @@ def _train_step(self, precision):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -147,12 +146,12 @@ def _train_step(self, precision): ...@@ -147,12 +146,12 @@ def _train_step(self, precision):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -171,13 +170,11 @@ def _inference_step(self, precision): ...@@ -171,13 +170,11 @@ def _inference_step(self, precision):
self._model.eval() self._model.eval()
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch CNN models.""" """Module of the Pytorch CNN models."""
import time
import torch import torch
from torchvision import models from torchvision import models
...@@ -99,10 +97,11 @@ def _train_step(self, precision): ...@@ -99,10 +97,11 @@ def _train_step(self, precision):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -110,12 +109,12 @@ def _train_step(self, precision): ...@@ -110,12 +109,12 @@ def _train_step(self, precision):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -135,13 +134,11 @@ def _inference_step(self, precision): ...@@ -135,13 +134,11 @@ def _inference_step(self, precision):
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch GPT2 model.""" """Module of the Pytorch GPT2 model."""
import time
import torch import torch
from transformers import GPT2Model, GPT2Config from transformers import GPT2Model, GPT2Config
...@@ -131,9 +129,10 @@ def _train_step(self, precision): ...@@ -131,9 +129,10 @@ def _train_step(self, precision):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -141,12 +140,12 @@ def _train_step(self, precision): ...@@ -141,12 +140,12 @@ def _train_step(self, precision):
loss = self._loss_fn(output[range(self._args.batch_size), -1], self._target) loss = self._loss_fn(output[range(self._args.batch_size), -1], self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -165,13 +164,11 @@ def _inference_step(self, precision): ...@@ -165,13 +164,11 @@ def _inference_step(self, precision):
self._model.eval() self._model.eval()
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch LSTM model.""" """Module of the Pytorch LSTM model."""
import time
import torch import torch
from superbench.common.utils import logger from superbench.common.utils import logger
...@@ -139,10 +137,11 @@ def _train_step(self, precision): ...@@ -139,10 +137,11 @@ def _train_step(self, precision):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -150,12 +149,12 @@ def _train_step(self, precision): ...@@ -150,12 +149,12 @@ def _train_step(self, precision):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -175,13 +174,11 @@ def _inference_step(self, precision): ...@@ -175,13 +174,11 @@ def _inference_step(self, precision):
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# Server: # Server:
# - Product: HPE Apollo 6500 # - Product: HPE Apollo 6500
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
var: var:
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
# - Product: G482-Z53 # - Product: G482-Z53
# - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html # - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
var: var:
......
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
monitor: monitor:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment