Unverified Commit 6681c720 authored by Yifan Xiong's avatar Yifan Xiong Committed by GitHub
Browse files

Release - SuperBench v0.5.0 (#350)



**Description**

Cherry-pick  bug fixes from v0.5.0 to main.

**Major Revisions**

* Bug - Force to fix ort version as '1.10.0' (#343)
* Bug - Support no matching rules and unify the output name in result_summary (#345)
* Analyzer - Support regex in annotations of benchmark naming for metrics in rules (#344)
* Bug - Fix bugs in sync results on root rank for e2e model benchmarks (#342)
* Bug - Fix bug of duration feature for model benchmarks in distributed mode (#347)
* Docs - Upgrade version and release note (#348)
Co-authored-by: default avatarYuting Jiang <v-yutjiang@microsoft.com>
parent 712eafc3
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
__SuperBench__ is a validation and profiling tool for AI infrastructure. __SuperBench__ is a validation and profiling tool for AI infrastructure.
📢 [v0.4.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.4.0) has been released! 📢 [v0.5.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.5.0) has been released!
## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._ ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._
......
...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it. ...@@ -61,7 +61,7 @@ You can clone the source from GitHub and build it.
:::note Note :::note Note
You should checkout corresponding tag to use release version, for example, You should checkout corresponding tag to use release version, for example,
`git clone -b v0.4.0 https://github.com/microsoft/superbenchmark` `git clone -b v0.5.0 https://github.com/microsoft/superbenchmark`
::: :::
```bash ```bash
......
...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password] ...@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
:::note Note :::note Note
You should deploy corresponding Docker image to use release version, for example, You should deploy corresponding Docker image to use release version, for example,
`sb deploy -f local.ini -i superbench/superbench:v0.4.0-cuda11.1.1` `sb deploy -f local.ini -i superbench/superbench:v0.5.0-cuda11.1.1`
You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone. You should note that version of git repo only determines version of sb CLI, and not the sb container. You should define the container version even if you specified a release version for the git clone.
......
...@@ -70,7 +70,7 @@ superbench: ...@@ -70,7 +70,7 @@ superbench:
<TabItem value='example'> <TabItem value='example'>
```yaml ```yaml
version: v0.4 version: v0.5
superbench: superbench:
enable: benchmark_1 enable: benchmark_1
monitor: monitor:
......
...@@ -28,7 +28,8 @@ available tags are listed below for all stable versions. ...@@ -28,7 +28,8 @@ available tags are listed below for all stable versions.
<TabItem value='cuda'> <TabItem value='cuda'>
| Tag | Description | | Tag | Description |
| ----------------- | ---------------------------------- | |-------------------|------------------------------------|
| v0.5.0-cuda11.1.1 | SuperBench v0.5.0 with CUDA 11.1.1 |
| v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 | | v0.4.0-cuda11.1.1 | SuperBench v0.4.0 with CUDA 11.1.1 |
| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 | | v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
| v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 | | v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
...@@ -38,7 +39,11 @@ available tags are listed below for all stable versions. ...@@ -38,7 +39,11 @@ available tags are listed below for all stable versions.
<TabItem value='rocm'> <TabItem value='rocm'>
| Tag | Description | | Tag | Description |
| --------------------------- | ---------------------------------------------- | |-------------------------------|--------------------------------------------------|
| v0.5.0-rocm5.0.1-pytorch1.9.0 | SuperBench v0.5.0 with ROCm 5.0.1, PyTorch 1.9.0 |
| v0.5.0-rocm5.0-pytorch1.9.0 | SuperBench v0.5.0 with ROCm 5.0, PyTorch 1.9.0 |
| v0.5.0-rocm4.2-pytorch1.7.0 | SuperBench v0.5.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.5.0-rocm4.0-pytorch1.7.0 | SuperBench v0.5.0 with ROCm 4.0, PyTorch 1.7.0 |
| v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 | | v0.4.0-rocm4.2-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.2, PyTorch 1.7.0 |
| v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 | | v0.4.0-rocm4.0-pytorch1.7.0 | SuperBench v0.4.0 with ROCm 4.0, PyTorch 1.7.0 |
| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 | | v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
......
...@@ -65,7 +65,7 @@ superbench: ...@@ -65,7 +65,7 @@ superbench:
example: example:
```yaml ```yaml
# SuperBench rules # SuperBench rules
version: v0.4 version: v0.5
superbench: superbench:
rules: rules:
failure-rule: failure-rule:
......
...@@ -20,14 +20,12 @@ This tool is to generate a readable summary report based on the raw benchmark re ...@@ -20,14 +20,12 @@ This tool is to generate a readable summary report based on the raw benchmark re
sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir} sb result summary --data-file ./results-summary.jsonl --rule-file ./rule.yaml --output-file-format md --output-dir ${output-dir}
``` ```
4. Find the output result file named 'results_summary.md' under ${output_dir}. 4. Find the output result file named 'results-summary.md' under ${output_dir}.
## Input ## Input
The input includes 2 files: The input includes 2 files:
- **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner. - **Raw Data**: jsonl file including multiple nodes' results automatically generated by SuperBench runner.
:::tip Tips :::tip Tips
...@@ -60,7 +58,7 @@ superbench: ...@@ -60,7 +58,7 @@ superbench:
```yaml title="Example" ```yaml title="Example"
# SuperBench rules # SuperBench rules
version: v0.4 version: v0.5
superbench: superbench:
rules: rules:
kernel_launch: kernel_launch:
...@@ -122,3 +120,8 @@ The following illustrates all statistical functions: ...@@ -122,3 +120,8 @@ The following illustrates all statistical functions:
- `min` - `min`
- `p${value}`: ${value} can be 1-99. For example, p50, p90, etc. - `p${value}`: ${value} can be 1-99. For example, p50, p90, etc.
- `std` - `std`
## Output
We support different output formats for result sumamry including markdown, html, etc.
The output includes the metrics grouped by category and their values obtained by applying statistical methods to all raw results.
...@@ -173,7 +173,7 @@ setup( ...@@ -173,7 +173,7 @@ setup(
'nvidia': ['py3nvml>=0.2.6'], 'nvidia': ['py3nvml>=0.2.6'],
'ort': [ 'ort': [
'onnx>=1.10.2', 'onnx>=1.10.2',
'onnxruntime-gpu>=1.9.0', 'onnxruntime-gpu==1.10.0',
], ],
'torch': [ 'torch': [
'torch>=1.7.0a0', 'torch>=1.7.0a0',
......
...@@ -6,5 +6,5 @@ ...@@ -6,5 +6,5 @@
Provide hardware and software benchmarks for AI systems. Provide hardware and software benchmarks for AI systems.
""" """
__version__ = '0.4.0' __version__ = '0.5.0'
__author__ = 'Microsoft' __author__ = 'Microsoft'
...@@ -84,19 +84,23 @@ class ResultSummary(RuleBase): ...@@ -84,19 +84,23 @@ class ResultSummary(RuleBase):
logger.error('ResultSummary: parse rules failed - {}'.format(str(e))) logger.error('ResultSummary: parse rules failed - {}'.format(str(e)))
return False return False
def _format_summary_of_rule(self, category, summary_df_of_rule): def _format_summary_of_rule(self, category, summary_df_of_rule, statistics):
"""Format summary_df of a rule info list of lines. """Format summary_df of a rule info list of lines.
Args: Args:
category (str): category in the rule category (str): category in the rule
summary_df_of_rule ([type]): summary df of a rule, the columns are metrics, the index are statistics summary_df_of_rule ([type]): summary df of a rule, the columns are metrics, the index are statistics
statistics (list): statistics in the rule
Returns: Returns:
list: list of summary lines like [category, metric, statistic, value] list: list of summary lines like [category, metric, statistic, value]
""" """
summary = [] summary = []
metrics = summary_df_of_rule.columns metrics = summary_df_of_rule.columns
if metrics.empty is True:
for statistic in statistics:
summary.append([category, '', statistic, ''])
for metric in metrics: for metric in metrics:
for statistic in summary_df_of_rule.index: for statistic in statistics:
summary.append([category, metric, statistic, summary_df_of_rule.loc[statistic, metric]]) summary.append([category, metric, statistic, summary_df_of_rule.loc[statistic, metric]])
return summary return summary
...@@ -132,6 +136,10 @@ class ResultSummary(RuleBase): ...@@ -132,6 +136,10 @@ class ResultSummary(RuleBase):
metrics = list(self._sb_rules[rule]['metrics'].keys()) metrics = list(self._sb_rules[rule]['metrics'].keys())
category = self._sb_rules[rule]['categories'] category = self._sb_rules[rule]['categories']
data_df_of_rule = self._raw_data_df[metrics] data_df_of_rule = self._raw_data_df[metrics]
statistics = self._sb_rules[rule]['statistics']
summary_df_of_rule = pd.DataFrame()
# skip metrics aggregation and statistics calculation fot the rule with no matched metrics
if len(metrics) != 0:
if self._sb_rules[rule]['aggregate']: if self._sb_rules[rule]['aggregate']:
# if aggregate is True, aggregate in ranks # if aggregate is True, aggregate in ranks
if self._sb_rules[rule]['aggregate'] is True: if self._sb_rules[rule]['aggregate'] is True:
...@@ -139,7 +147,6 @@ class ResultSummary(RuleBase): ...@@ -139,7 +147,6 @@ class ResultSummary(RuleBase):
# if aggregate is not empty and is a pattern in regex, aggregate according to pattern # if aggregate is not empty and is a pattern in regex, aggregate according to pattern
else: else:
data_df_of_rule = data_analysis.aggregate(data_df_of_rule, self._sb_rules[rule]['aggregate']) data_df_of_rule = data_analysis.aggregate(data_df_of_rule, self._sb_rules[rule]['aggregate'])
statistics = self._sb_rules[rule]['statistics']
summary_df_of_rule = pd.DataFrame(columns=sorted(data_df_of_rule.columns)) summary_df_of_rule = pd.DataFrame(columns=sorted(data_df_of_rule.columns))
for statistic_name in statistics: for statistic_name in statistics:
# get SummaryOp and calculate statistics # get SummaryOp and calculate statistics
...@@ -157,7 +164,7 @@ class ResultSummary(RuleBase): ...@@ -157,7 +164,7 @@ class ResultSummary(RuleBase):
summary_df_of_rule, round, list(summary_df_of_rule.columns) summary_df_of_rule, round, list(summary_df_of_rule.columns)
) )
# format summary_df of a rule to list of lines # format summary_df of a rule to list of lines
summary_lines_of_rule = self._format_summary_of_rule(category, summary_df_of_rule) summary_lines_of_rule = self._format_summary_of_rule(category, summary_df_of_rule, statistics)
summary[category] = summary_lines_of_rule summary[category] = summary_lines_of_rule
return summary return summary
...@@ -233,15 +240,15 @@ class ResultSummary(RuleBase): ...@@ -233,15 +240,15 @@ class ResultSummary(RuleBase):
# output result summary to file # output result summary to file
output_path = '' output_path = ''
if output_format == 'excel': if output_format == 'excel':
output_path = str(Path(output_dir) / 'results_summary.xlsx') output_path = str(Path(output_dir) / 'results-summary.xlsx')
summary_df = self._merge_summary(summary) summary_df = self._merge_summary(summary)
self.output_summary_in_excel(self._raw_data_df, summary_df, output_path) self.output_summary_in_excel(self._raw_data_df, summary_df, output_path)
elif output_format == 'md': elif output_format == 'md':
output_path = str(Path(output_dir) / 'results_summary.md') output_path = str(Path(output_dir) / 'results-summary.md')
lines = self.generate_md_lines(summary) lines = self.generate_md_lines(summary)
file_handler.output_lines_in_md(lines, output_path) file_handler.output_lines_in_md(lines, output_path)
elif output_format == 'html': elif output_format == 'html':
output_path = str(Path(output_dir) / 'results_summary.html') output_path = str(Path(output_dir) / 'results-summary.html')
lines = self.generate_md_lines(summary) lines = self.generate_md_lines(summary)
file_handler.output_lines_in_html(lines, output_path) file_handler.output_lines_in_html(lines, output_path)
else: else:
......
...@@ -32,6 +32,9 @@ class RuleBase(): ...@@ -32,6 +32,9 @@ class RuleBase():
logger.warning('RuleBase: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric)) logger.warning('RuleBase: get_metrics_by_benchmarks - {} does not have benchmark_name'.format(metric))
else: else:
benchmark = metric.split('/')[0] benchmark = metric.split('/')[0]
# support annotations in benchmark naming
if ':' in benchmark:
benchmark = metric.split(':')[0]
if benchmark not in benchmarks_metrics: if benchmark not in benchmarks_metrics:
benchmarks_metrics[benchmark] = set() benchmarks_metrics[benchmark] = set()
benchmarks_metrics[benchmark].add(metric) benchmarks_metrics[benchmark].add(metric)
......
...@@ -35,6 +35,7 @@ class ModelBenchmark(Benchmark): ...@@ -35,6 +35,7 @@ class ModelBenchmark(Benchmark):
self._benchmark_type = BenchmarkType.MODEL self._benchmark_type = BenchmarkType.MODEL
self._world_size = 1 self._world_size = 1
self._local_rank = None self._local_rank = None
self._global_rank = None
self._dataset = None self._dataset = None
self._dataloader = None self._dataloader = None
self._model = None self._model = None
...@@ -242,7 +243,8 @@ class ModelBenchmark(Benchmark): ...@@ -242,7 +243,8 @@ class ModelBenchmark(Benchmark):
# The unit of step time should be millisecond. # The unit of step time should be millisecond.
step_times = self._train_step(precision) step_times = self._train_step(precision)
if not self.__process_model_result(ModelAction.TRAIN, precision, step_times): step_times = self.__process_model_result(ModelAction.TRAIN, precision, step_times)
if not step_times:
self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT) self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT)
return False return False
...@@ -266,7 +268,8 @@ class ModelBenchmark(Benchmark): ...@@ -266,7 +268,8 @@ class ModelBenchmark(Benchmark):
self._create_model(precision) self._create_model(precision)
# The unit of step time should be millisecond. # The unit of step time should be millisecond.
step_times = self._inference_step(precision) step_times = self._inference_step(precision)
if not self.__process_model_result(ModelAction.INFERENCE, precision, step_times): step_times = self.__process_model_result(ModelAction.INFERENCE, precision, step_times)
if not step_times:
self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT) self._result.set_return_code(ReturnCode.INVALID_BENCHMARK_RESULT)
return False return False
...@@ -369,9 +372,9 @@ class ModelBenchmark(Benchmark): ...@@ -369,9 +372,9 @@ class ModelBenchmark(Benchmark):
result (list): The result data to sync. result (list): The result data to sync.
Return: Return:
True if reduce result data successfully. Result if reduce result data successfully, otherwise None.
""" """
return True return result
def __process_model_result(self, model_action, precision, step_times): def __process_model_result(self, model_action, precision, step_times):
"""Function to process raw results and save the summarized results. """Function to process raw results and save the summarized results.
...@@ -382,7 +385,7 @@ class ModelBenchmark(Benchmark): ...@@ -382,7 +385,7 @@ class ModelBenchmark(Benchmark):
step_times (list): The step time list of every training/inference step, unit is millisecond. step_times (list): The step time list of every training/inference step, unit is millisecond.
Return: Return:
True if step_times list is not empty. step_times if step_times list is not empty, otherwise None.
""" """
if len(step_times) == 0: if len(step_times) == 0:
logger.error( logger.error(
...@@ -390,7 +393,7 @@ class ModelBenchmark(Benchmark): ...@@ -390,7 +393,7 @@ class ModelBenchmark(Benchmark):
self._curr_run_index, self._name, model_action, precision self._curr_run_index, self._name, model_action, precision
) )
) )
return False return None
precision_metric = {'float16': 'fp16', 'float32': 'fp32', 'float64': 'fp64', 'bfloat16': 'bf16'} precision_metric = {'float16': 'fp16', 'float32': 'fp32', 'float64': 'fp64', 'bfloat16': 'bf16'}
if precision.value in precision_metric.keys(): if precision.value in precision_metric.keys():
...@@ -404,9 +407,10 @@ class ModelBenchmark(Benchmark): ...@@ -404,9 +407,10 @@ class ModelBenchmark(Benchmark):
self._result.add_raw_data(metric_t, throughput, self._args.log_raw_data) self._result.add_raw_data(metric_t, throughput, self._args.log_raw_data)
if model_action == ModelAction.TRAIN: if model_action == ModelAction.TRAIN:
if not self._sync_result(step_times): step_times = self._sync_result(step_times)
return False if not step_times:
if self._local_rank is None or self._local_rank == 0: return None
if self._local_rank is None or self._global_rank == 0:
self._result.add_result(metric_s, statistics.mean(step_times)) self._result.add_result(metric_s, statistics.mean(step_times))
throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times] throughput = [millisecond_per_second / step_time * self._args.batch_size for step_time in step_times]
self._result.add_result(metric_t, statistics.mean(throughput)) self._result.add_result(metric_t, statistics.mean(throughput))
...@@ -416,7 +420,7 @@ class ModelBenchmark(Benchmark): ...@@ -416,7 +420,7 @@ class ModelBenchmark(Benchmark):
self._process_percentile_result(metric_s, step_times) self._process_percentile_result(metric_s, step_times)
self._process_percentile_result(metric_t, throughput) self._process_percentile_result(metric_t, throughput)
return True return step_times
@abstractmethod @abstractmethod
def _cal_params_count(self): def _cal_params_count(self):
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
import os import os
from datetime import timedelta from datetime import timedelta
import time
import torch import torch
import transformers import transformers
...@@ -60,6 +61,7 @@ class PytorchBase(ModelBenchmark): ...@@ -60,6 +61,7 @@ class PytorchBase(ModelBenchmark):
hvd.init() hvd.init()
self._world_size = int(hvd.size()) self._world_size = int(hvd.size())
self._local_rank = int(hvd.local_rank()) self._local_rank = int(hvd.local_rank())
self._global_rank = int(hvd.rank())
elif self._args.distributed_impl == DistributedImpl.DDP: elif self._args.distributed_impl == DistributedImpl.DDP:
if os.environ.get('WORLD_SIZE') is None or os.environ.get('LOCAL_RANK') is None: if os.environ.get('WORLD_SIZE') is None or os.environ.get('LOCAL_RANK') is None:
logger.error( logger.error(
...@@ -70,17 +72,17 @@ class PytorchBase(ModelBenchmark): ...@@ -70,17 +72,17 @@ class PytorchBase(ModelBenchmark):
# torch >= 1.9.0a0 torch.distributed.elastic is used by default # torch >= 1.9.0a0 torch.distributed.elastic is used by default
port = int(os.environ['MASTER_PORT']) + 1 port = int(os.environ['MASTER_PORT']) + 1
addr = os.environ['MASTER_ADDR'] addr = os.environ['MASTER_ADDR']
global_rank = int(os.environ['RANK']) self._global_rank = int(os.environ['RANK'])
self._local_rank = int(os.environ['LOCAL_RANK']) self._local_rank = int(os.environ['LOCAL_RANK'])
self._world_size = int(os.environ['WORLD_SIZE']) self._world_size = int(os.environ['WORLD_SIZE'])
logger.debug('ip:{},port:{},rank:{},world:{}'.format(addr, port, global_rank, self._world_size)) logger.debug('ip:{},port:{},rank:{},world:{}'.format(addr, port, self._global_rank, self._world_size))
store = PrefixStore( store = PrefixStore(
self._name, TCPStore(addr, port, self._world_size, global_rank == 0, timedelta(seconds=300)) self._name, TCPStore(addr, port, self._world_size, self._global_rank == 0, timedelta(seconds=300))
) )
torch.distributed.init_process_group( torch.distributed.init_process_group(
backend=self._args.distributed_backend.value, backend=self._args.distributed_backend.value,
timeout=timedelta(seconds=300), timeout=timedelta(seconds=300),
rank=global_rank, rank=self._global_rank,
world_size=self._world_size, world_size=self._world_size,
store=store store=store
) )
...@@ -188,6 +190,33 @@ class PytorchBase(ModelBenchmark): ...@@ -188,6 +190,33 @@ class PytorchBase(ModelBenchmark):
return True return True
def _is_finished(self, curr_step, curr_time, check_frequency=100):
"""Judge whether the benchmarking should be stopped early or not.
Args:
curr_step (int): the current benchmarking step.
curr_time (float): the current time in seconds got from time.time().
check_frequency (int): the frequency (step numbers) to check if benchmark should be stopped.
Return:
True if the benchmarking should be stopped.
"""
is_finished = int(super()._is_finished(curr_step, curr_time))
if self._args.duration > 0:
if curr_step % check_frequency == 0:
# sync is_finished in distributed mode
# if any rank is_finished is True, all ranks should be finished
if self._args.distributed_impl == DistributedImpl.DDP:
tensor = torch.IntTensor([is_finished])
if self._args.distributed_backend == DistributedBackend.NCCL:
tensor = tensor.cuda()
torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.MAX)
is_finished = tensor.tolist()[0]
else:
is_finished = 0
return (is_finished == 1)
def _sync_result(self, result): def _sync_result(self, result):
"""Function to reduce the result to rank 0. """Function to reduce the result to rank 0.
...@@ -195,10 +224,11 @@ class PytorchBase(ModelBenchmark): ...@@ -195,10 +224,11 @@ class PytorchBase(ModelBenchmark):
result (list): The result data to sync. result (list): The result data to sync.
Return: Return:
True if reduce result data successfully. Result if reduce result data successfully, otherwise None.
""" """
if not super()._sync_result(result): result = super()._sync_result(result)
return False if not result:
return None
try: try:
if self._args.distributed_impl == DistributedImpl.DDP: if self._args.distributed_impl == DistributedImpl.DDP:
...@@ -206,7 +236,7 @@ class PytorchBase(ModelBenchmark): ...@@ -206,7 +236,7 @@ class PytorchBase(ModelBenchmark):
tensor = torch.as_tensor(result).cuda() tensor = torch.as_tensor(result).cuda()
else: else:
tensor = torch.as_tensor(result) tensor = torch.as_tensor(result)
torch.distributed.reduce(tensor, 0, op=torch.distributed.ReduceOp.MAX) torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.MAX)
result = tensor.tolist() result = tensor.tolist()
except BaseException as e: except BaseException as e:
logger.error( logger.error(
...@@ -214,9 +244,9 @@ class PytorchBase(ModelBenchmark): ...@@ -214,9 +244,9 @@ class PytorchBase(ModelBenchmark):
self._name, self._args.distributed_impl, str(e) self._name, self._args.distributed_impl, str(e)
) )
) )
return False return None
return True return result
def _postprocess(self): def _postprocess(self):
"""Postprocess/cleanup operations after the benchmarking. """Postprocess/cleanup operations after the benchmarking.
...@@ -257,3 +287,16 @@ class PytorchBase(ModelBenchmark): ...@@ -257,3 +287,16 @@ class PytorchBase(ModelBenchmark):
The count of trainable parameters. The count of trainable parameters.
""" """
return sum(p.numel() for p in self._model.parameters() if p.requires_grad) return sum(p.numel() for p in self._model.parameters() if p.requires_grad)
def _timer(self):
"""Returns the current time which ensures all previous CUDA events have been finished.
If there is no GPU present, this defaults to `time.time()`; otherwise it will
synchronize CUDA before measuring the time.
Returns:
Current time in second.
"""
if self._gpu_available:
torch.cuda.synchronize()
return time.time()
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch BERT model.""" """Module of the Pytorch BERT model."""
import time
import torch import torch
from transformers import BertModel, BertConfig from transformers import BertModel, BertConfig
...@@ -137,9 +135,10 @@ class PytorchBERT(PytorchBase): ...@@ -137,9 +135,10 @@ class PytorchBERT(PytorchBase):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -147,12 +146,12 @@ class PytorchBERT(PytorchBase): ...@@ -147,12 +146,12 @@ class PytorchBERT(PytorchBase):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -171,13 +170,11 @@ class PytorchBERT(PytorchBase): ...@@ -171,13 +170,11 @@ class PytorchBERT(PytorchBase):
self._model.eval() self._model.eval()
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch CNN models.""" """Module of the Pytorch CNN models."""
import time
import torch import torch
from torchvision import models from torchvision import models
...@@ -99,10 +97,11 @@ class PytorchCNN(PytorchBase): ...@@ -99,10 +97,11 @@ class PytorchCNN(PytorchBase):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -110,12 +109,12 @@ class PytorchCNN(PytorchBase): ...@@ -110,12 +109,12 @@ class PytorchCNN(PytorchBase):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -135,13 +134,11 @@ class PytorchCNN(PytorchBase): ...@@ -135,13 +134,11 @@ class PytorchCNN(PytorchBase):
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch GPT2 model.""" """Module of the Pytorch GPT2 model."""
import time
import torch import torch
from transformers import GPT2Model, GPT2Config from transformers import GPT2Model, GPT2Config
...@@ -131,9 +129,10 @@ class PytorchGPT2(PytorchBase): ...@@ -131,9 +129,10 @@ class PytorchGPT2(PytorchBase):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -141,12 +140,12 @@ class PytorchGPT2(PytorchBase): ...@@ -141,12 +140,12 @@ class PytorchGPT2(PytorchBase):
loss = self._loss_fn(output[range(self._args.batch_size), -1], self._target) loss = self._loss_fn(output[range(self._args.batch_size), -1], self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -165,13 +164,11 @@ class PytorchGPT2(PytorchBase): ...@@ -165,13 +164,11 @@ class PytorchGPT2(PytorchBase):
self._model.eval() self._model.eval()
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,8 +3,6 @@ ...@@ -3,8 +3,6 @@
"""Module of the Pytorch LSTM model.""" """Module of the Pytorch LSTM model."""
import time
import torch import torch
from superbench.common.utils import logger from superbench.common.utils import logger
...@@ -139,10 +137,11 @@ class PytorchLSTM(PytorchBase): ...@@ -139,10 +137,11 @@ class PytorchLSTM(PytorchBase):
""" """
duration = [] duration = []
curr_step = 0 curr_step = 0
check_frequency = 100
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._optimizer.zero_grad() self._optimizer.zero_grad()
...@@ -150,12 +149,12 @@ class PytorchLSTM(PytorchBase): ...@@ -150,12 +149,12 @@ class PytorchLSTM(PytorchBase):
loss = self._loss_fn(output, self._target) loss = self._loss_fn(output, self._target)
loss.backward() loss.backward()
self._optimizer.step() self._optimizer.step()
end = time.time() end = self._timer()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
duration.append((end - start) * 1000) duration.append((end - start) * 1000)
if self._is_finished(curr_step, end): if self._is_finished(curr_step, end, check_frequency):
return duration return duration
def _inference_step(self, precision): def _inference_step(self, precision):
...@@ -175,13 +174,11 @@ class PytorchLSTM(PytorchBase): ...@@ -175,13 +174,11 @@ class PytorchLSTM(PytorchBase):
while True: while True:
for idx, sample in enumerate(self._dataloader): for idx, sample in enumerate(self._dataloader):
sample = sample.to(dtype=getattr(torch, precision.value)) sample = sample.to(dtype=getattr(torch, precision.value))
start = time.time() start = self._timer()
if self._gpu_available: if self._gpu_available:
sample = sample.cuda() sample = sample.cuda()
self._model(sample) self._model(sample)
if self._gpu_available: end = self._timer()
torch.cuda.synchronize()
end = time.time()
curr_step += 1 curr_step += 1
if curr_step > self._args.num_warmup: if curr_step > self._args.num_warmup:
# Save the step time of every training/inference step, unit is millisecond. # Save the step time of every training/inference step, unit is millisecond.
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# Server: # Server:
# - Product: HPE Apollo 6500 # - Product: HPE Apollo 6500
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
var: var:
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
# - Product: G482-Z53 # - Product: G482-Z53
# - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html # - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
var: var:
......
version: v0.4 version: v0.5
superbench: superbench:
enable: null enable: null
monitor: monitor:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment