Commit d4051602 authored by one's avatar one
Browse files

Migrate gpu-stream to BabelStream v5.0

parent 1a57f2d6
...@@ -267,20 +267,42 @@ For measurements of peer-to-peer communication performance between AMD GPUs, GPU ...@@ -267,20 +267,42 @@ For measurements of peer-to-peer communication performance between AMD GPUs, GPU
#### Introduction #### Introduction
Measure the memory bandwidth of GPU using the STREAM benchmark. The benchmark tests various memory operations including copy, scale, add, and triad for double datatype. Measure the memory bandwidth of GPU using BabelStream (`hip-stream`) backend.
The benchmark executes copy, scale, add, triad, and dot operations.
The `array_size` parameter represents the number of elements.
Each benchmark run measures the GPU visible to the current process.
#### Metrics #### Metrics
| Metric Name | Unit | Description | | Metric Name | Unit | Description |
|------------------------------------------------------------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------| |-------------------------------------------------------------------|------------------|------------------------------------------------------------------------------------------------|
| STREAM\_COPY\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the copy operation with specified buffer size and block size. | | STREAM\_INIT\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Initialization phase bandwidth for the current benchmark run and one array size. |
| STREAM\_SCALE\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the scale operation with specified buffer size and block size. | | STREAM\_INIT\_[float\|double]\_array\_[0-9]+\_time | time (s) | Initialization phase runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the add operation with specified buffer size and block size. | | STREAM\_READ\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Read phase bandwidth for the current benchmark run and one array size. |
| STREAM\_TRIAD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the triad operation with specified buffer size and block size. | | STREAM\_READ\_[float\|double]\_array\_[0-9]+\_time | time (s) | Read phase runtime for the current benchmark run and one array size. |
| STREAM\_COPY\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the copy operation with specified buffer size and block size. | | STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum copy bandwidth for the current benchmark run and one array size. |
| STREAM\_SCALE\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the scale operation with specified buffer size and block size. | | STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum copy runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the add operation with specified buffer size and block size. | | STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum copy runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the triad operation with specified buffer size and block size. | | STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average copy runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum mul bandwidth for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum mul runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum mul runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average mul runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum add bandwidth for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum add runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum add runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average add runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum triad bandwidth for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum triad runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum triad runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average triad runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum dot bandwidth for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum dot runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum dot runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average dot runtime for the current benchmark run and one array size. |
`gpu-stream` reports `phase` and `function` metrics. `_ratio` and `block_*` metrics are removed.
Bandwidth metrics are converted from BabelStream `max_mbytes_per_sec` by using `GB/s = MB/s / 1000`.
### `ib-loopback` ### `ib-loopback`
......
...@@ -12,13 +12,12 @@ ...@@ -12,13 +12,12 @@
if __name__ == '__main__': if __name__ == '__main__':
context = BenchmarkRegistry.create_benchmark_context( context = BenchmarkRegistry.create_benchmark_context(
'gpu-stream', platform=Platform.CUDA, parameters='--num_warm_up 1 --num_loops 10' 'gpu-stream', platform=Platform.CUDA, parameters='--array_size 268435456 --num_loops 10 --precision double'
) )
# For ROCm environment, please specify the benchmark name and the platform as the following. # For ROCm environment, please specify the benchmark name and the platform as the following.
# context = BenchmarkRegistry.create_benchmark_context( # context = BenchmarkRegistry.create_benchmark_context(
# 'gpu-stream', platform=Platform.ROCM, parameters='--num_warm_up 1 --num_loops 10' # 'gpu-stream', platform=Platform.ROCM, parameters='--array_size 268435456 --num_loops 10 --precision float'
# ) # )
# To enable data checking, please add '--check_data'.
benchmark = BenchmarkRegistry.launch_benchmark(context) benchmark = BenchmarkRegistry.launch_benchmark(context)
if benchmark: if benchmark:
......
...@@ -3,6 +3,8 @@ ...@@ -3,6 +3,8 @@
"""Module of the GPU Stream Performance benchmark.""" """Module of the GPU Stream Performance benchmark."""
import csv
import io
import os import os
from superbench.common.utils import logger from superbench.common.utils import logger
...@@ -12,6 +14,18 @@ ...@@ -12,6 +14,18 @@
class GpuStreamBenchmark(MicroBenchmarkWithInvoke): class GpuStreamBenchmark(MicroBenchmarkWithInvoke):
"""The GPU stream performance benchmark class.""" """The GPU stream performance benchmark class."""
_function_metric_map = {
'Copy': 'COPY',
'Mul': 'MUL',
'Add': 'ADD',
'Triad': 'TRIAD',
'Dot': 'DOT',
}
_phase_metric_map = {
'Init': 'INIT',
'Read': 'READ',
}
def __init__(self, name, parameters=''): def __init__(self, name, parameters=''):
"""Constructor. """Constructor.
...@@ -21,26 +35,19 @@ def __init__(self, name, parameters=''): ...@@ -21,26 +35,19 @@ def __init__(self, name, parameters=''):
""" """
super().__init__(name, parameters) super().__init__(name, parameters)
self._bin_name = 'gpu_stream' self._bin_name = 'hip-stream'
self.__bin_path = None
def add_parser_arguments(self): def add_parser_arguments(self):
"""Add the specified arguments.""" """Add the specified arguments."""
super().add_parser_arguments() super().add_parser_arguments()
self._parser.add_argument( self._parser.add_argument(
'--size', '--array_size',
type=int,
default=4096 * 1024**2,
required=False,
help='Size of data buffer in bytes.',
)
self._parser.add_argument(
'--num_warm_up',
type=int, type=int,
default=20, default=268435456,
required=False, required=False,
help='Number of warm up rounds', help='Number of elements in array.',
) )
self._parser.add_argument( self._parser.add_argument(
...@@ -48,13 +55,16 @@ def add_parser_arguments(self): ...@@ -48,13 +55,16 @@ def add_parser_arguments(self):
type=int, type=int,
default=100, default=100,
required=False, required=False,
help='Number of data buffer copies performed.', help='Number of benchmark runs, mapping to --numtimes in BabelStream.',
) )
self._parser.add_argument( self._parser.add_argument(
'--check_data', '--precision',
action='store_true', type=str,
help='Enable data checking', default='double',
choices=['double', 'float'],
required=False,
help='Data type for benchmark.',
) )
def _preprocess(self): def _preprocess(self):
...@@ -68,17 +78,138 @@ def _preprocess(self): ...@@ -68,17 +78,138 @@ def _preprocess(self):
self.__bin_path = os.path.join(self._args.bin_dir, self._bin_name) self.__bin_path = os.path.join(self._args.bin_dir, self._bin_name)
args = '--size %d --num_warm_up %d --num_loops %d ' % ( args = f'--arraysize {self._args.array_size} --numtimes {self._args.num_loops} --csv'
self._args.size, self._args.num_warm_up, self._args.num_loops if self._args.precision == 'float':
) args += ' --float'
self._commands = ['{} {}'.format(self.__bin_path, args)]
if self._args.check_data:
args += ' --check_data'
self._commands = ['%s %s' % (self.__bin_path, args)]
return True return True
def _get_device_name(self, raw_output):
"""Extract device name from BabelStream output when available."""
for line in raw_output.splitlines():
if line.startswith('Using HIP device '):
return line[len('Using HIP device '):].strip()
return 'Unknown'
@staticmethod
def _mbps_to_gbps(value):
"""Convert MB/s to GB/s."""
return float(value) / 1000
def _parse_csv_phase_rows(self, raw_output):
"""Extract phase rows from BabelStream CSV output."""
lines = [line.strip() for line in raw_output.strip().splitlines() if line.strip()]
header = 'phase,n_elements,sizeof,max_mbytes_per_sec,runtime'
if header not in lines:
raise ValueError('No phase CSV header found in output.')
start_idx = lines.index(header)
csv_content = '\n'.join(lines[start_idx:])
reader = csv.DictReader(io.StringIO(csv_content))
phase_rows = []
for row in reader:
phase_name = row.get('phase', '').strip()
if phase_name in self._phase_metric_map:
metric_tag = self._phase_metric_map[phase_name]
array_size = int(row['n_elements'])
phase_rows.append({
'metric_name': self._get_phase_bw_metric_name(metric_tag, array_size),
'value': self._mbps_to_gbps(row['max_mbytes_per_sec']),
})
phase_rows.append({
'metric_name': self._get_phase_time_metric_name(metric_tag, array_size),
'value': float(row['runtime']),
})
if not phase_rows:
raise ValueError('No valid phase rows found in CSV output.')
return phase_rows
def _parse_csv_function_rows(self, raw_output):
"""Extract function rows from BabelStream CSV output."""
lines = [line.strip() for line in raw_output.strip().splitlines() if line.strip()]
header = 'function,num_times,n_elements,sizeof,max_mbytes_per_sec,min_runtime,max_runtime,avg_runtime'
if header not in lines:
raise ValueError('No function CSV header found in output.')
start_idx = lines.index(header)
csv_content = '\n'.join(lines[start_idx:])
reader = csv.DictReader(io.StringIO(csv_content))
function_rows = []
for row in reader:
function_name = row.get('function', '').strip()
if function_name in self._function_metric_map:
metric_tag = self._function_metric_map[function_name]
array_size = int(row['n_elements'])
function_rows.append({
'metric_name': self._get_function_bw_metric_name(metric_tag, array_size),
'value': self._mbps_to_gbps(row['max_mbytes_per_sec']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'min'),
'value': float(row['min_runtime']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'max'),
'value': float(row['max_runtime']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'avg'),
'value': float(row['avg_runtime']),
})
if not function_rows:
raise ValueError('No valid function rows found in CSV output.')
return function_rows
def _get_phase_bw_metric_name(self, metric_tag, array_size):
"""Build phase bandwidth metric name."""
return 'STREAM_{}_{}_array_{}_bw'.format(metric_tag, self._args.precision, array_size)
def _get_phase_time_metric_name(self, metric_tag, array_size):
"""Build phase runtime metric name."""
return 'STREAM_{}_{}_array_{}_time'.format(metric_tag, self._args.precision, array_size)
def _get_function_bw_metric_name(self, metric_tag, array_size):
"""Build function bandwidth metric name."""
return 'STREAM_{}_{}_array_{}_bw'.format(metric_tag, self._args.precision, array_size)
def _get_function_time_metric_name(self, metric_tag, array_size, metric_type):
"""Build function runtime metric name."""
return 'STREAM_{}_{}_array_{}_time_{}'.format(metric_tag, self._args.precision, array_size, metric_type)
def _format_device_output(self, device_name, metrics):
"""Render one device section in a human-readable format."""
metric_width = max(len(metric['metric_name']) for metric in metrics)
output_lines = ['Device: {}'.format(device_name)]
for metric in metrics:
output_lines.append('{:<{width}} {:.6f}'.format(metric['metric_name'], metric['value'], width=metric_width))
return output_lines
def _get_text_output_header(self):
"""Render benchmark metadata in the text output header."""
return [
'STREAM Benchmark (BabelStream backend)',
'Array size(elements): {}'.format(self._args.array_size),
'Number of loops: {}'.format(self._args.num_loops),
'Precision: {}'.format(self._args.precision),
'Bandwidth unit: GB/s (converted from MB/s / 1000)',
]
def _parse_device_output(self, raw_output):
"""Parse one device output and return rendered lines and parsed metrics."""
device_name = self._get_device_name(raw_output)
metrics = self._parse_csv_phase_rows(raw_output) + self._parse_csv_function_rows(raw_output)
rendered_lines = self._get_text_output_header()
rendered_lines.append('')
rendered_lines.extend(self._format_device_output(device_name, metrics))
return rendered_lines, metrics
def _process_raw_result(self, cmd_idx, raw_output): def _process_raw_result(self, cmd_idx, raw_output):
"""Function to parse raw results and save the summarized results. """Function to parse raw results and save the summarized results.
...@@ -91,19 +222,11 @@ def _process_raw_result(self, cmd_idx, raw_output): ...@@ -91,19 +222,11 @@ def _process_raw_result(self, cmd_idx, raw_output):
Return: Return:
True if the raw output string is valid and result can be extracted. True if the raw output string is valid and result can be extracted.
""" """
self._result.add_raw_data('raw_output_' + str(cmd_idx), raw_output, self._args.log_raw_data)
try: try:
output_lines = [x.strip() for x in raw_output.strip().splitlines()] rendered_lines, metrics = self._parse_device_output(raw_output)
count = 0 self._result.add_raw_data('raw_output_' + str(cmd_idx), '\n'.join(rendered_lines), self._args.log_raw_data)
for output_line in output_lines: for metric in metrics:
if output_line.startswith('STREAM_'): self._result.add_result(metric['metric_name'], metric['value'])
count += 1
tag, bw_str, ratio = output_line.split()
self._result.add_result(tag + '_bw', float(bw_str))
self._result.add_result(tag + '_ratio', float(ratio))
if count == 0:
raise BaseException('No valid results found.')
except BaseException as e: except BaseException as e:
self._result.set_return_code(ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE) self._result.set_return_code(ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE)
logger.error( logger.error(
......
# Copyright (c) Microsoft Corporation. # Copyright (c) Microsoft Corporation.
# Licensed under the MIT License. # Licensed under the MIT License.
"""Tests for gpu_stream benchmark.""" """Tests for gpu-stream benchmark."""
import numbers
import unittest import unittest
from pathlib import Path
from tests.helper import decorator from tests.helper import decorator
from tests.helper.testcase import BenchmarkTestCase from tests.helper.testcase import BenchmarkTestCase
...@@ -12,51 +12,50 @@ ...@@ -12,51 +12,50 @@
class GpuStreamBenchmarkTest(BenchmarkTestCase, unittest.TestCase): class GpuStreamBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
"""Test class for gpu_stream benchmark.""" """Test class for gpu-stream benchmark."""
@classmethod @classmethod
def setUpClass(cls): def setUpClass(cls):
"""Hook method for setting up class fixture before running tests in the class.""" """Hook method for setting up class fixture before running tests in the class."""
super().setUpClass() super().setUpClass()
cls.createMockEnvs(cls) cls.createMockEnvs(cls)
cls.createMockFiles(cls, ['bin/gpu_stream']) cls.createMockFiles(cls, ['bin/hip-stream'])
@staticmethod
def _load_fixture(filename):
return (Path('tests/data') / filename).read_text()
def _test_gpu_stream_command_generation(self, platform): def _test_gpu_stream_command_generation(self, platform):
"""Test gpu-stream benchmark command generation.""" """Test gpu-stream benchmark command generation."""
benchmark_name = 'gpu-stream' benchmark_name = 'gpu-stream'
(benchmark_class, (benchmark_class, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
predefine_params) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
assert (benchmark_class) assert (benchmark_class)
num_warm_up = 5 parameters = '--array_size 268435456 --num_loops 20 --precision float'
num_loops = 10
size = 25769803776
parameters = '--num_warm_up %d --num_loops %d --size %d ' \
'--check_data' % \
(num_warm_up, num_loops, size)
benchmark = benchmark_class(benchmark_name, parameters=parameters) benchmark = benchmark_class(benchmark_name, parameters=parameters)
# Check basic information
assert (benchmark)
ret = benchmark._preprocess() ret = benchmark._preprocess()
assert (benchmark)
assert (ret is True) assert (ret is True)
assert (benchmark.return_code == ReturnCode.SUCCESS) assert (benchmark.return_code == ReturnCode.SUCCESS)
assert (benchmark.name == benchmark_name) assert (benchmark.name == benchmark_name)
assert (benchmark.type == BenchmarkType.MICRO) assert (benchmark.type == BenchmarkType.MICRO)
assert (benchmark._args.array_size == 268435456)
assert (benchmark._args.num_loops == 20)
assert (benchmark._args.precision == 'float')
# Check parameters specified in BenchmarkContext.
assert (benchmark._args.size == size)
assert (benchmark._args.num_warm_up == num_warm_up)
assert (benchmark._args.num_loops == num_loops)
assert (benchmark._args.check_data)
# Check command
assert (1 == len(benchmark._commands)) assert (1 == len(benchmark._commands))
assert (benchmark._commands[0].startswith(benchmark._GpuStreamBenchmark__bin_path)) assert (benchmark._commands[0].startswith(benchmark._GpuStreamBenchmark__bin_path))
assert ('--size %d' % size in benchmark._commands[0]) assert ('--arraysize 268435456' in benchmark._commands[0])
assert ('--num_warm_up %d' % num_warm_up in benchmark._commands[0]) assert ('--numtimes 20' in benchmark._commands[0])
assert ('--num_loops %d' % num_loops in benchmark._commands[0]) assert ('--csv' in benchmark._commands[0])
assert ('--check_data' in benchmark._commands[0]) assert ('--float' in benchmark._commands[0])
assert ('--device' not in benchmark._commands[0])
benchmark = benchmark_class(benchmark_name, parameters='--array_size 1024 --num_loops 2')
assert (benchmark._preprocess() is True)
assert (benchmark._args.precision == 'double')
assert ('--float' not in benchmark._commands[0])
@decorator.cuda_test @decorator.cuda_test
def test_gpu_stream_command_generation_cuda(self): def test_gpu_stream_command_generation_cuda(self):
...@@ -68,47 +67,61 @@ def test_gpu_stream_command_generation_rocm(self): ...@@ -68,47 +67,61 @@ def test_gpu_stream_command_generation_rocm(self):
"""Test gpu-stream benchmark command generation, ROCm case.""" """Test gpu-stream benchmark command generation, ROCm case."""
self._test_gpu_stream_command_generation(Platform.ROCM) self._test_gpu_stream_command_generation(Platform.ROCM)
@decorator.load_data('tests/data/gpu_stream.log') def _test_gpu_stream_result_parsing(self, platform):
def _test_gpu_stream_result_parsing(self, platform, test_raw_output):
"""Test gpu-stream benchmark result parsing.""" """Test gpu-stream benchmark result parsing."""
benchmark_name = 'gpu-stream' benchmark_name = 'gpu-stream'
(benchmark_class, (benchmark_class, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
predefine_params) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
assert (benchmark_class) assert (benchmark_class)
benchmark = benchmark_class(benchmark_name, parameters='')
benchmark = benchmark_class(benchmark_name, parameters='--precision double')
assert (benchmark) assert (benchmark)
ret = benchmark._preprocess() assert (benchmark._preprocess() is True)
assert (ret is True)
assert (benchmark.return_code == ReturnCode.SUCCESS) assert (benchmark.return_code == ReturnCode.SUCCESS)
assert (benchmark.name == 'gpu-stream') assert (benchmark.name == benchmark_name)
assert (benchmark.type == BenchmarkType.MICRO) assert (benchmark.type == BenchmarkType.MICRO)
# Positive case - valid raw output. valid_output = self._load_fixture('gpu_stream.log')
assert (benchmark._process_raw_result(0, test_raw_output))
assert (benchmark.return_code == ReturnCode.SUCCESS)
assert (1 == len(benchmark.raw_data)) assert (benchmark._process_raw_result(0, valid_output))
# print(test_raw_output.splitlines()) assert (benchmark.return_code == ReturnCode.SUCCESS)
test_raw_output_dict = { assert ('raw_output_0' in benchmark.raw_data)
x.split()[0]: [float(x.split()[1]), float(x.split()[2])] assert ('Device: BW150' in benchmark.raw_data['raw_output_0'][0])
for x in test_raw_output.strip().splitlines() if x.startswith('STREAM_')
expected_metric_values = {
'STREAM_INIT_double_array_268435456_bw': 6.77961,
'STREAM_INIT_double_array_268435456_time': 0.950269,
'STREAM_READ_double_array_268435456_bw': 1255.98,
'STREAM_READ_double_array_268435456_time': 0.00512943,
'STREAM_COPY_double_array_268435456_bw': 1345.22,
'STREAM_COPY_double_array_268435456_time_min': 0.00319277,
'STREAM_COPY_double_array_268435456_time_max': 0.00320985,
'STREAM_COPY_double_array_268435456_time_avg': 0.00319879,
'STREAM_MUL_double_array_268435456_bw': 1370.7,
'STREAM_MUL_double_array_268435456_time_min': 0.00313342,
'STREAM_MUL_double_array_268435456_time_max': 0.00314978,
'STREAM_MUL_double_array_268435456_time_avg': 0.00313862,
'STREAM_ADD_double_array_268435456_bw': 1292.74,
'STREAM_ADD_double_array_268435456_time_min': 0.00498358,
'STREAM_ADD_double_array_268435456_time_max': 0.00499938,
'STREAM_ADD_double_array_268435456_time_avg': 0.00498747,
'STREAM_TRIAD_double_array_268435456_bw': 1292.52,
'STREAM_TRIAD_double_array_268435456_time_min': 0.00498439,
'STREAM_TRIAD_double_array_268435456_time_max': 0.00499791,
'STREAM_TRIAD_double_array_268435456_time_avg': 0.00498815,
'STREAM_DOT_double_array_268435456_bw': 1271.19,
'STREAM_DOT_double_array_268435456_time_min': 0.00337869,
'STREAM_DOT_double_array_268435456_time_max': 0.00359398,
'STREAM_DOT_double_array_268435456_time_avg': 0.0033883,
} }
assert (len(test_raw_output_dict) * 2 + benchmark.default_metric_count == len(benchmark.result)) for metric_name, expected_value in expected_metric_values.items():
for output_key in benchmark.result: assert (metric_name in benchmark.result)
if output_key == 'return_code': assert (abs(benchmark.result[metric_name][0] - expected_value) < 1e-6)
assert (benchmark.result[output_key] == [0])
else: assert (all(not metric.endswith('_ratio') for metric in benchmark.result))
assert (len(benchmark.result[output_key]) == 1)
assert (isinstance(benchmark.result[output_key][0], numbers.Number)) benchmark = benchmark_class(benchmark_name, parameters='--precision double')
if output_key.endswith('_bw'): assert (benchmark._preprocess() is True)
assert (output_key.strip('_bw') in test_raw_output_dict) assert (benchmark._process_raw_result(0, 'Invalid raw output') is False)
assert (test_raw_output_dict[output_key.strip('_bw')][0] == benchmark.result[output_key][0])
else:
assert (output_key.strip('_ratio') in test_raw_output_dict)
assert (test_raw_output_dict[output_key.strip('_ratio')][1] == benchmark.result[output_key][0])
# Negative case - invalid raw output.
assert (benchmark._process_raw_result(1, 'Invalid raw output') is False)
assert (benchmark.return_code == ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE) assert (benchmark.return_code == ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE)
@decorator.cuda_test @decorator.cuda_test
......
STREAM Benchmark Using HIP device BW150
Buffer size(bytes): 4294967296 Driver: 60326045
Number of warm up runs: 10 Memory: DEFAULT
Number of loops: 40 phase,n_elements,sizeof,max_mbytes_per_sec,runtime
Check data: No Init,268435456,8,6779.61,0.950269
Read,268435456,8,1.25598e+06,0.00512943
Device 0: "NVIDIA Graphics Device" 152 SMs(10.0) Memory: 4000MHz x 8192-bit = 8192 GB/s PEAK ECC is ON function,num_times,n_elements,sizeof,max_mbytes_per_sec,min_runtime,max_runtime,avg_runtime
STREAM_COPY_double_gpu_0_buffer_4294967296_block_128 6711.67 81.93 Copy,100,268435456,8,1.34522e+06,0.00319277,0.00320985,0.00319879
STREAM_COPY_double_gpu_0_buffer_4294967296_block_256 6549.50 79.95 Mul,100,268435456,8,1.3707e+06,0.00313342,0.00314978,0.00313862
STREAM_COPY_double_gpu_0_buffer_4294967296_block_512 6195.43 75.63 Add,100,268435456,8,1.29274e+06,0.00498358,0.00499938,0.00498747
STREAM_COPY_double_gpu_0_buffer_4294967296_block_1024 5721.52 69.84 Triad,100,268435456,8,1.29252e+06,0.00498439,0.00499791,0.00498815
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_128 6680.42 81.55 Dot,100,268435456,8,1.27119e+06,0.00337869,0.00359398,0.0033883
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_256 6515.51 79.54
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_512 6106.69 74.54
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_1024 5626.68 68.69
STREAM_ADD_double_gpu_0_buffer_4294967296_block_128 7379.25 90.08
STREAM_ADD_double_gpu_0_buffer_4294967296_block_256 7407.27 90.42
STREAM_ADD_double_gpu_0_buffer_4294967296_block_512 7309.59 89.23
STREAM_ADD_double_gpu_0_buffer_4294967296_block_1024 6788.64 82.87
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_128 7378.19 90.07
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_256 7414.01 90.50
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_512 7295.50 89.06
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_1024 6730.42 82.16
Device 1: "NVIDIA Graphics Device" 152 SMs(10.0) Memory: 4000.00MHz x 8192-bit = 8192.00 GB/s PEAK ECC is ON
STREAM_COPY_double_gpu_1_buffer_4294967296_block_128 6708.74 81.89
STREAM_COPY_double_gpu_1_buffer_4294967296_block_256 6549.47 79.95
STREAM_COPY_double_gpu_1_buffer_4294967296_block_512 6195.39 75.63
STREAM_COPY_double_gpu_1_buffer_4294967296_block_1024 5725.07 69.89
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_128 6678.56 81.53
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_256 6514.05 79.52
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_512 6103.80 74.51
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_1024 5630.41 68.73
STREAM_ADD_double_gpu_1_buffer_4294967296_block_128 7377.74 90.06
STREAM_ADD_double_gpu_1_buffer_4294967296_block_256 7410.97 90.47
STREAM_ADD_double_gpu_1_buffer_4294967296_block_512 7310.80 89.24
STREAM_ADD_double_gpu_1_buffer_4294967296_block_1024 6789.91 82.88
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_128 7379.03 90.08
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_256 7414.04 90.50
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_512 7298.26 89.09
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_1024 6732.15 82.18
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment