Commit d4051602 authored by one's avatar one
Browse files

Migrate gpu-stream to BabelStream v5.0

parent 1a57f2d6
......@@ -267,20 +267,42 @@ For measurements of peer-to-peer communication performance between AMD GPUs, GPU
#### Introduction
Measure the memory bandwidth of GPU using the STREAM benchmark. The benchmark tests various memory operations including copy, scale, add, and triad for double datatype.
Measure the memory bandwidth of GPU using BabelStream (`hip-stream`) backend.
The benchmark executes copy, scale, add, triad, and dot operations.
The `array_size` parameter represents the number of elements.
Each benchmark run measures the GPU visible to the current process.
#### Metrics
| Metric Name | Unit | Description |
|------------------------------------------------------------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| STREAM\_COPY\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the copy operation with specified buffer size and block size. |
| STREAM\_SCALE\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the scale operation with specified buffer size and block size. |
| STREAM\_ADD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the add operation with specified buffer size and block size. |
| STREAM\_TRIAD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_bw | bandwidth (GB/s) | The fp64 memory bandwidth of the GPU for the triad operation with specified buffer size and block size. |
| STREAM\_COPY\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the copy operation with specified buffer size and block size. |
| STREAM\_SCALE\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the scale operation with specified buffer size and block size. |
| STREAM\_ADD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the add operation with specified buffer size and block size. |
| STREAM\_TRIAD\_double\_gpu\_[0-9]\_buffer\_[0-9]+\_block\_[0-9]+\_ratio | Efficiency (%) | The fp64 memory bandwidth efficiency of the GPU for the triad operation with specified buffer size and block size. |
| Metric Name | Unit | Description |
|-------------------------------------------------------------------|------------------|------------------------------------------------------------------------------------------------|
| STREAM\_INIT\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Initialization phase bandwidth for the current benchmark run and one array size. |
| STREAM\_INIT\_[float\|double]\_array\_[0-9]+\_time | time (s) | Initialization phase runtime for the current benchmark run and one array size. |
| STREAM\_READ\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Read phase bandwidth for the current benchmark run and one array size. |
| STREAM\_READ\_[float\|double]\_array\_[0-9]+\_time | time (s) | Read phase runtime for the current benchmark run and one array size. |
| STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum copy bandwidth for the current benchmark run and one array size. |
| STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum copy runtime for the current benchmark run and one array size. |
| STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum copy runtime for the current benchmark run and one array size. |
| STREAM\_COPY\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average copy runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum mul bandwidth for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum mul runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum mul runtime for the current benchmark run and one array size. |
| STREAM\_MUL\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average mul runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum add bandwidth for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum add runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum add runtime for the current benchmark run and one array size. |
| STREAM\_ADD\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average add runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum triad bandwidth for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum triad runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum triad runtime for the current benchmark run and one array size. |
| STREAM\_TRIAD\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average triad runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_bw | bandwidth (GB/s) | Maximum dot bandwidth for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_min | time (s) | Minimum dot runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_max | time (s) | Maximum dot runtime for the current benchmark run and one array size. |
| STREAM\_DOT\_[float\|double]\_array\_[0-9]+\_time\_avg | time (s) | Average dot runtime for the current benchmark run and one array size. |
`gpu-stream` reports `phase` and `function` metrics. `_ratio` and `block_*` metrics are removed.
Bandwidth metrics are converted from BabelStream `max_mbytes_per_sec` by using `GB/s = MB/s / 1000`.
### `ib-loopback`
......
......@@ -12,13 +12,12 @@
if __name__ == '__main__':
context = BenchmarkRegistry.create_benchmark_context(
'gpu-stream', platform=Platform.CUDA, parameters='--num_warm_up 1 --num_loops 10'
'gpu-stream', platform=Platform.CUDA, parameters='--array_size 268435456 --num_loops 10 --precision double'
)
# For ROCm environment, please specify the benchmark name and the platform as the following.
# context = BenchmarkRegistry.create_benchmark_context(
# 'gpu-stream', platform=Platform.ROCM, parameters='--num_warm_up 1 --num_loops 10'
# 'gpu-stream', platform=Platform.ROCM, parameters='--array_size 268435456 --num_loops 10 --precision float'
# )
# To enable data checking, please add '--check_data'.
benchmark = BenchmarkRegistry.launch_benchmark(context)
if benchmark:
......
......@@ -3,6 +3,8 @@
"""Module of the GPU Stream Performance benchmark."""
import csv
import io
import os
from superbench.common.utils import logger
......@@ -12,6 +14,18 @@
class GpuStreamBenchmark(MicroBenchmarkWithInvoke):
"""The GPU stream performance benchmark class."""
_function_metric_map = {
'Copy': 'COPY',
'Mul': 'MUL',
'Add': 'ADD',
'Triad': 'TRIAD',
'Dot': 'DOT',
}
_phase_metric_map = {
'Init': 'INIT',
'Read': 'READ',
}
def __init__(self, name, parameters=''):
"""Constructor.
......@@ -21,26 +35,19 @@ def __init__(self, name, parameters=''):
"""
super().__init__(name, parameters)
self._bin_name = 'gpu_stream'
self._bin_name = 'hip-stream'
self.__bin_path = None
def add_parser_arguments(self):
"""Add the specified arguments."""
super().add_parser_arguments()
self._parser.add_argument(
'--size',
type=int,
default=4096 * 1024**2,
required=False,
help='Size of data buffer in bytes.',
)
self._parser.add_argument(
'--num_warm_up',
'--array_size',
type=int,
default=20,
default=268435456,
required=False,
help='Number of warm up rounds',
help='Number of elements in array.',
)
self._parser.add_argument(
......@@ -48,13 +55,16 @@ def add_parser_arguments(self):
type=int,
default=100,
required=False,
help='Number of data buffer copies performed.',
help='Number of benchmark runs, mapping to --numtimes in BabelStream.',
)
self._parser.add_argument(
'--check_data',
action='store_true',
help='Enable data checking',
'--precision',
type=str,
default='double',
choices=['double', 'float'],
required=False,
help='Data type for benchmark.',
)
def _preprocess(self):
......@@ -68,17 +78,138 @@ def _preprocess(self):
self.__bin_path = os.path.join(self._args.bin_dir, self._bin_name)
args = '--size %d --num_warm_up %d --num_loops %d ' % (
self._args.size, self._args.num_warm_up, self._args.num_loops
)
if self._args.check_data:
args += ' --check_data'
self._commands = ['%s %s' % (self.__bin_path, args)]
args = f'--arraysize {self._args.array_size} --numtimes {self._args.num_loops} --csv'
if self._args.precision == 'float':
args += ' --float'
self._commands = ['{} {}'.format(self.__bin_path, args)]
return True
def _get_device_name(self, raw_output):
"""Extract device name from BabelStream output when available."""
for line in raw_output.splitlines():
if line.startswith('Using HIP device '):
return line[len('Using HIP device '):].strip()
return 'Unknown'
@staticmethod
def _mbps_to_gbps(value):
"""Convert MB/s to GB/s."""
return float(value) / 1000
def _parse_csv_phase_rows(self, raw_output):
"""Extract phase rows from BabelStream CSV output."""
lines = [line.strip() for line in raw_output.strip().splitlines() if line.strip()]
header = 'phase,n_elements,sizeof,max_mbytes_per_sec,runtime'
if header not in lines:
raise ValueError('No phase CSV header found in output.')
start_idx = lines.index(header)
csv_content = '\n'.join(lines[start_idx:])
reader = csv.DictReader(io.StringIO(csv_content))
phase_rows = []
for row in reader:
phase_name = row.get('phase', '').strip()
if phase_name in self._phase_metric_map:
metric_tag = self._phase_metric_map[phase_name]
array_size = int(row['n_elements'])
phase_rows.append({
'metric_name': self._get_phase_bw_metric_name(metric_tag, array_size),
'value': self._mbps_to_gbps(row['max_mbytes_per_sec']),
})
phase_rows.append({
'metric_name': self._get_phase_time_metric_name(metric_tag, array_size),
'value': float(row['runtime']),
})
if not phase_rows:
raise ValueError('No valid phase rows found in CSV output.')
return phase_rows
def _parse_csv_function_rows(self, raw_output):
"""Extract function rows from BabelStream CSV output."""
lines = [line.strip() for line in raw_output.strip().splitlines() if line.strip()]
header = 'function,num_times,n_elements,sizeof,max_mbytes_per_sec,min_runtime,max_runtime,avg_runtime'
if header not in lines:
raise ValueError('No function CSV header found in output.')
start_idx = lines.index(header)
csv_content = '\n'.join(lines[start_idx:])
reader = csv.DictReader(io.StringIO(csv_content))
function_rows = []
for row in reader:
function_name = row.get('function', '').strip()
if function_name in self._function_metric_map:
metric_tag = self._function_metric_map[function_name]
array_size = int(row['n_elements'])
function_rows.append({
'metric_name': self._get_function_bw_metric_name(metric_tag, array_size),
'value': self._mbps_to_gbps(row['max_mbytes_per_sec']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'min'),
'value': float(row['min_runtime']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'max'),
'value': float(row['max_runtime']),
})
function_rows.append({
'metric_name': self._get_function_time_metric_name(metric_tag, array_size, 'avg'),
'value': float(row['avg_runtime']),
})
if not function_rows:
raise ValueError('No valid function rows found in CSV output.')
return function_rows
def _get_phase_bw_metric_name(self, metric_tag, array_size):
"""Build phase bandwidth metric name."""
return 'STREAM_{}_{}_array_{}_bw'.format(metric_tag, self._args.precision, array_size)
def _get_phase_time_metric_name(self, metric_tag, array_size):
"""Build phase runtime metric name."""
return 'STREAM_{}_{}_array_{}_time'.format(metric_tag, self._args.precision, array_size)
def _get_function_bw_metric_name(self, metric_tag, array_size):
"""Build function bandwidth metric name."""
return 'STREAM_{}_{}_array_{}_bw'.format(metric_tag, self._args.precision, array_size)
def _get_function_time_metric_name(self, metric_tag, array_size, metric_type):
"""Build function runtime metric name."""
return 'STREAM_{}_{}_array_{}_time_{}'.format(metric_tag, self._args.precision, array_size, metric_type)
def _format_device_output(self, device_name, metrics):
"""Render one device section in a human-readable format."""
metric_width = max(len(metric['metric_name']) for metric in metrics)
output_lines = ['Device: {}'.format(device_name)]
for metric in metrics:
output_lines.append('{:<{width}} {:.6f}'.format(metric['metric_name'], metric['value'], width=metric_width))
return output_lines
def _get_text_output_header(self):
"""Render benchmark metadata in the text output header."""
return [
'STREAM Benchmark (BabelStream backend)',
'Array size(elements): {}'.format(self._args.array_size),
'Number of loops: {}'.format(self._args.num_loops),
'Precision: {}'.format(self._args.precision),
'Bandwidth unit: GB/s (converted from MB/s / 1000)',
]
def _parse_device_output(self, raw_output):
"""Parse one device output and return rendered lines and parsed metrics."""
device_name = self._get_device_name(raw_output)
metrics = self._parse_csv_phase_rows(raw_output) + self._parse_csv_function_rows(raw_output)
rendered_lines = self._get_text_output_header()
rendered_lines.append('')
rendered_lines.extend(self._format_device_output(device_name, metrics))
return rendered_lines, metrics
def _process_raw_result(self, cmd_idx, raw_output):
"""Function to parse raw results and save the summarized results.
......@@ -91,19 +222,11 @@ def _process_raw_result(self, cmd_idx, raw_output):
Return:
True if the raw output string is valid and result can be extracted.
"""
self._result.add_raw_data('raw_output_' + str(cmd_idx), raw_output, self._args.log_raw_data)
try:
output_lines = [x.strip() for x in raw_output.strip().splitlines()]
count = 0
for output_line in output_lines:
if output_line.startswith('STREAM_'):
count += 1
tag, bw_str, ratio = output_line.split()
self._result.add_result(tag + '_bw', float(bw_str))
self._result.add_result(tag + '_ratio', float(ratio))
if count == 0:
raise BaseException('No valid results found.')
rendered_lines, metrics = self._parse_device_output(raw_output)
self._result.add_raw_data('raw_output_' + str(cmd_idx), '\n'.join(rendered_lines), self._args.log_raw_data)
for metric in metrics:
self._result.add_result(metric['metric_name'], metric['value'])
except BaseException as e:
self._result.set_return_code(ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE)
logger.error(
......
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
"""Tests for gpu_stream benchmark."""
"""Tests for gpu-stream benchmark."""
import numbers
import unittest
from pathlib import Path
from tests.helper import decorator
from tests.helper.testcase import BenchmarkTestCase
......@@ -12,51 +12,50 @@
class GpuStreamBenchmarkTest(BenchmarkTestCase, unittest.TestCase):
"""Test class for gpu_stream benchmark."""
"""Test class for gpu-stream benchmark."""
@classmethod
def setUpClass(cls):
"""Hook method for setting up class fixture before running tests in the class."""
super().setUpClass()
cls.createMockEnvs(cls)
cls.createMockFiles(cls, ['bin/gpu_stream'])
cls.createMockFiles(cls, ['bin/hip-stream'])
@staticmethod
def _load_fixture(filename):
return (Path('tests/data') / filename).read_text()
def _test_gpu_stream_command_generation(self, platform):
"""Test gpu-stream benchmark command generation."""
benchmark_name = 'gpu-stream'
(benchmark_class,
predefine_params) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
(benchmark_class, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
assert (benchmark_class)
num_warm_up = 5
num_loops = 10
size = 25769803776
parameters = '--num_warm_up %d --num_loops %d --size %d ' \
'--check_data' % \
(num_warm_up, num_loops, size)
parameters = '--array_size 268435456 --num_loops 20 --precision float'
benchmark = benchmark_class(benchmark_name, parameters=parameters)
# Check basic information
assert (benchmark)
ret = benchmark._preprocess()
assert (benchmark)
assert (ret is True)
assert (benchmark.return_code == ReturnCode.SUCCESS)
assert (benchmark.name == benchmark_name)
assert (benchmark.type == BenchmarkType.MICRO)
assert (benchmark._args.array_size == 268435456)
assert (benchmark._args.num_loops == 20)
assert (benchmark._args.precision == 'float')
# Check parameters specified in BenchmarkContext.
assert (benchmark._args.size == size)
assert (benchmark._args.num_warm_up == num_warm_up)
assert (benchmark._args.num_loops == num_loops)
assert (benchmark._args.check_data)
# Check command
assert (1 == len(benchmark._commands))
assert (benchmark._commands[0].startswith(benchmark._GpuStreamBenchmark__bin_path))
assert ('--size %d' % size in benchmark._commands[0])
assert ('--num_warm_up %d' % num_warm_up in benchmark._commands[0])
assert ('--num_loops %d' % num_loops in benchmark._commands[0])
assert ('--check_data' in benchmark._commands[0])
assert ('--arraysize 268435456' in benchmark._commands[0])
assert ('--numtimes 20' in benchmark._commands[0])
assert ('--csv' in benchmark._commands[0])
assert ('--float' in benchmark._commands[0])
assert ('--device' not in benchmark._commands[0])
benchmark = benchmark_class(benchmark_name, parameters='--array_size 1024 --num_loops 2')
assert (benchmark._preprocess() is True)
assert (benchmark._args.precision == 'double')
assert ('--float' not in benchmark._commands[0])
@decorator.cuda_test
def test_gpu_stream_command_generation_cuda(self):
......@@ -68,47 +67,61 @@ def test_gpu_stream_command_generation_rocm(self):
"""Test gpu-stream benchmark command generation, ROCm case."""
self._test_gpu_stream_command_generation(Platform.ROCM)
@decorator.load_data('tests/data/gpu_stream.log')
def _test_gpu_stream_result_parsing(self, platform, test_raw_output):
def _test_gpu_stream_result_parsing(self, platform):
"""Test gpu-stream benchmark result parsing."""
benchmark_name = 'gpu-stream'
(benchmark_class,
predefine_params) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
(benchmark_class, _) = BenchmarkRegistry._BenchmarkRegistry__select_benchmark(benchmark_name, platform)
assert (benchmark_class)
benchmark = benchmark_class(benchmark_name, parameters='')
benchmark = benchmark_class(benchmark_name, parameters='--precision double')
assert (benchmark)
ret = benchmark._preprocess()
assert (ret is True)
assert (benchmark._preprocess() is True)
assert (benchmark.return_code == ReturnCode.SUCCESS)
assert (benchmark.name == 'gpu-stream')
assert (benchmark.name == benchmark_name)
assert (benchmark.type == BenchmarkType.MICRO)
# Positive case - valid raw output.
assert (benchmark._process_raw_result(0, test_raw_output))
assert (benchmark.return_code == ReturnCode.SUCCESS)
valid_output = self._load_fixture('gpu_stream.log')
assert (1 == len(benchmark.raw_data))
# print(test_raw_output.splitlines())
test_raw_output_dict = {
x.split()[0]: [float(x.split()[1]), float(x.split()[2])]
for x in test_raw_output.strip().splitlines() if x.startswith('STREAM_')
assert (benchmark._process_raw_result(0, valid_output))
assert (benchmark.return_code == ReturnCode.SUCCESS)
assert ('raw_output_0' in benchmark.raw_data)
assert ('Device: BW150' in benchmark.raw_data['raw_output_0'][0])
expected_metric_values = {
'STREAM_INIT_double_array_268435456_bw': 6.77961,
'STREAM_INIT_double_array_268435456_time': 0.950269,
'STREAM_READ_double_array_268435456_bw': 1255.98,
'STREAM_READ_double_array_268435456_time': 0.00512943,
'STREAM_COPY_double_array_268435456_bw': 1345.22,
'STREAM_COPY_double_array_268435456_time_min': 0.00319277,
'STREAM_COPY_double_array_268435456_time_max': 0.00320985,
'STREAM_COPY_double_array_268435456_time_avg': 0.00319879,
'STREAM_MUL_double_array_268435456_bw': 1370.7,
'STREAM_MUL_double_array_268435456_time_min': 0.00313342,
'STREAM_MUL_double_array_268435456_time_max': 0.00314978,
'STREAM_MUL_double_array_268435456_time_avg': 0.00313862,
'STREAM_ADD_double_array_268435456_bw': 1292.74,
'STREAM_ADD_double_array_268435456_time_min': 0.00498358,
'STREAM_ADD_double_array_268435456_time_max': 0.00499938,
'STREAM_ADD_double_array_268435456_time_avg': 0.00498747,
'STREAM_TRIAD_double_array_268435456_bw': 1292.52,
'STREAM_TRIAD_double_array_268435456_time_min': 0.00498439,
'STREAM_TRIAD_double_array_268435456_time_max': 0.00499791,
'STREAM_TRIAD_double_array_268435456_time_avg': 0.00498815,
'STREAM_DOT_double_array_268435456_bw': 1271.19,
'STREAM_DOT_double_array_268435456_time_min': 0.00337869,
'STREAM_DOT_double_array_268435456_time_max': 0.00359398,
'STREAM_DOT_double_array_268435456_time_avg': 0.0033883,
}
assert (len(test_raw_output_dict) * 2 + benchmark.default_metric_count == len(benchmark.result))
for output_key in benchmark.result:
if output_key == 'return_code':
assert (benchmark.result[output_key] == [0])
else:
assert (len(benchmark.result[output_key]) == 1)
assert (isinstance(benchmark.result[output_key][0], numbers.Number))
if output_key.endswith('_bw'):
assert (output_key.strip('_bw') in test_raw_output_dict)
assert (test_raw_output_dict[output_key.strip('_bw')][0] == benchmark.result[output_key][0])
else:
assert (output_key.strip('_ratio') in test_raw_output_dict)
assert (test_raw_output_dict[output_key.strip('_ratio')][1] == benchmark.result[output_key][0])
# Negative case - invalid raw output.
assert (benchmark._process_raw_result(1, 'Invalid raw output') is False)
for metric_name, expected_value in expected_metric_values.items():
assert (metric_name in benchmark.result)
assert (abs(benchmark.result[metric_name][0] - expected_value) < 1e-6)
assert (all(not metric.endswith('_ratio') for metric in benchmark.result))
benchmark = benchmark_class(benchmark_name, parameters='--precision double')
assert (benchmark._preprocess() is True)
assert (benchmark._process_raw_result(0, 'Invalid raw output') is False)
assert (benchmark.return_code == ReturnCode.MICROBENCHMARK_RESULT_PARSING_FAILURE)
@decorator.cuda_test
......
STREAM Benchmark
Buffer size(bytes): 4294967296
Number of warm up runs: 10
Number of loops: 40
Check data: No
Device 0: "NVIDIA Graphics Device" 152 SMs(10.0) Memory: 4000MHz x 8192-bit = 8192 GB/s PEAK ECC is ON
STREAM_COPY_double_gpu_0_buffer_4294967296_block_128 6711.67 81.93
STREAM_COPY_double_gpu_0_buffer_4294967296_block_256 6549.50 79.95
STREAM_COPY_double_gpu_0_buffer_4294967296_block_512 6195.43 75.63
STREAM_COPY_double_gpu_0_buffer_4294967296_block_1024 5721.52 69.84
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_128 6680.42 81.55
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_256 6515.51 79.54
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_512 6106.69 74.54
STREAM_SCALE_double_gpu_0_buffer_4294967296_block_1024 5626.68 68.69
STREAM_ADD_double_gpu_0_buffer_4294967296_block_128 7379.25 90.08
STREAM_ADD_double_gpu_0_buffer_4294967296_block_256 7407.27 90.42
STREAM_ADD_double_gpu_0_buffer_4294967296_block_512 7309.59 89.23
STREAM_ADD_double_gpu_0_buffer_4294967296_block_1024 6788.64 82.87
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_128 7378.19 90.07
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_256 7414.01 90.50
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_512 7295.50 89.06
STREAM_TRIAD_double_gpu_0_buffer_4294967296_block_1024 6730.42 82.16
Device 1: "NVIDIA Graphics Device" 152 SMs(10.0) Memory: 4000.00MHz x 8192-bit = 8192.00 GB/s PEAK ECC is ON
STREAM_COPY_double_gpu_1_buffer_4294967296_block_128 6708.74 81.89
STREAM_COPY_double_gpu_1_buffer_4294967296_block_256 6549.47 79.95
STREAM_COPY_double_gpu_1_buffer_4294967296_block_512 6195.39 75.63
STREAM_COPY_double_gpu_1_buffer_4294967296_block_1024 5725.07 69.89
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_128 6678.56 81.53
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_256 6514.05 79.52
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_512 6103.80 74.51
STREAM_SCALE_double_gpu_1_buffer_4294967296_block_1024 5630.41 68.73
STREAM_ADD_double_gpu_1_buffer_4294967296_block_128 7377.74 90.06
STREAM_ADD_double_gpu_1_buffer_4294967296_block_256 7410.97 90.47
STREAM_ADD_double_gpu_1_buffer_4294967296_block_512 7310.80 89.24
STREAM_ADD_double_gpu_1_buffer_4294967296_block_1024 6789.91 82.88
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_128 7379.03 90.08
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_256 7414.04 90.50
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_512 7298.26 89.09
STREAM_TRIAD_double_gpu_1_buffer_4294967296_block_1024 6732.15 82.18
\ No newline at end of file
Using HIP device BW150
Driver: 60326045
Memory: DEFAULT
phase,n_elements,sizeof,max_mbytes_per_sec,runtime
Init,268435456,8,6779.61,0.950269
Read,268435456,8,1.25598e+06,0.00512943
function,num_times,n_elements,sizeof,max_mbytes_per_sec,min_runtime,max_runtime,avg_runtime
Copy,100,268435456,8,1.34522e+06,0.00319277,0.00320985,0.00319879
Mul,100,268435456,8,1.3707e+06,0.00313342,0.00314978,0.00313862
Add,100,268435456,8,1.29274e+06,0.00498358,0.00499938,0.00498747
Triad,100,268435456,8,1.29252e+06,0.00498439,0.00499791,0.00498815
Dot,100,268435456,8,1.27119e+06,0.00337869,0.00359398,0.0033883
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment