Unverified Commit 74421ffe authored by Ziyue Yang's avatar Ziyue Yang Committed by GitHub
Browse files

Benchmarks: Add Feature - Add bidirectional test support in gpu_copy benchmark (#285)

**Description**
This commit adds bidirectional tests in gpu_copy benchmark for both device-host transfer and device-device transfer, and revises related tests.
parent fd2bc9e0
...@@ -186,11 +186,16 @@ Measure the memory copy bandwidth performed by GPU SM/DMA engine, including devi ...@@ -186,11 +186,16 @@ Measure the memory copy bandwidth performed by GPU SM/DMA engine, including devi
#### Metrics #### Metrics
| Name | Unit | Description | | Name | Unit | Description |
|-------------------------------------------------------------------------------|------------------|----------------------------------------------------------------------------------------------------------------------------| |------------------------------------------------------------------------------------|------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| cpu\_to\_gpu[0-9]+\_by\_gpu[0-9]+\_using\_(sm\|dma)\_under_numa[0-9]+_bw | bandwidth (GB/s) | The bandwidth reading from all NUMA nodes' host memory using DMA engine or GPU SM by all GPUs. | | cpu\_to\_gpu[0-9]+\_by\_(sm\|dma)\_under\_numa[0-9]+\_uni\_bw | bandwidth (GB/s) | The unidirectional bandwidth of one GPU reading one NUMA node's host memory using DMA engine or GPU SM. |
| gpu[0-9]+\_to\_cpu\_by\_gpu[0-9]+\_using\_(sm\|dma)\_under_numa[0-9]+_bw | bandwidth (GB/s) | The bandwidth writing to all NUMA nodes' host memory using DMA engine or GPU SM by all GPUs. | | gpu[0-9]+\_to\_cpu\_by\_(sm\|dma)\_under\_numa[0-9]+\_uni\_bw | bandwidth (GB/s) | The unidirectional bandwidth of one GPU writing one NUMA node's host memory using DMA engine or GPU SM. |
| gpu[0-9]+\_to_gpu[0-9]+\_by\_gpu[0-9]+\_using\_(sm\|dma)\_under_numa[0-9]+_bw | bandwidth (GB/s) | The bandwidth reading from or writing to all GPUs using DMA engine or GPU SM by all GPUs with peer communication enabled. | | gpu[0-9]+\_to\_gpu[0-9]+\_by\_(sm\|dma)\_under\_numa[0-9]+\_uni\_bw | bandwidth (GB/s) | The unidirectional bandwidth of one GPU reading or writing self's memory using DMA engine or GPU SM with peer communication enabled. |
| gpu[0-9]+\_to\_gpu[0-9]+\_(read\|write)\_by\_(sm\|dma)\_under\_numa[0-9]+\_uni\_bw | bandwidth (GB/s) | The unidirectional bandwidth of one GPU reading or writing peer GPU's memory using DMA engine or GPU SM with peer communication enabled. |
| cpu\_to\_gpu[0-9]+\_by\_(sm\|dma)\_under\_numa[0-9]+\_bi\_bw | bandwidth (GB/s) | The bidirectional bandwidth of one GPU reading and writing one NUMA node's host memory using DMA engine or GPU SM. |
| gpu[0-9]+\_to\_cpu\_by\_(sm\|dma)\_under\_numa[0-9]+\_bi\_bw | bandwidth (GB/s) | Same as above. |
| gpu[0-9]+\_to\_gpu[0-9]+\_by\_(sm\|dma)\_under\_numa[0-9]+\_bi\_bw | bandwidth (GB/s) | The bidirectional bandwidth of one GPU reading and writing self's memory using DMA engine or GPU SM with peer communication enabled. |
| gpu[0-9]+\_to\_gpu[0-9]+\_(read\|write)\_by\_(sm\|dma)\_under\_numa[0-9]+\_bi\_bw | bandwidth (GB/s) | The bidirectional bandwidth of one GPU reading and writing peer GPU's memory using DMA engine or GPU SM with peer communication enabled. |
### `ib-loopback` ### `ib-loopback`
......
...@@ -18,6 +18,8 @@ ...@@ -18,6 +18,8 @@
# context = BenchmarkRegistry.create_benchmark_context( # context = BenchmarkRegistry.create_benchmark_context(
# 'gpu-copy-bw', platform=Platform.ROCM, parameters='--mem_type htod dtoh dtod --copy_type sm dma' # 'gpu-copy-bw', platform=Platform.ROCM, parameters='--mem_type htod dtoh dtod --copy_type sm dma'
# ) # )
# For bidirectional test, please specify parameters as the following.
# parameters='--mem_type htod dtod --copy_type sm dma --bidirectional'
benchmark = BenchmarkRegistry.launch_benchmark(context) benchmark = BenchmarkRegistry.launch_benchmark(context)
if benchmark: if benchmark:
......
...@@ -61,6 +61,12 @@ def add_parser_arguments(self): ...@@ -61,6 +61,12 @@ def add_parser_arguments(self):
help='Number of data buffer copies performed.', help='Number of data buffer copies performed.',
) )
self._parser.add_argument(
'--bidirectional',
action='store_true',
help='Enable bidirectional test',
)
def _preprocess(self): def _preprocess(self):
"""Preprocess/preparation operations before the benchmarking. """Preprocess/preparation operations before the benchmarking.
...@@ -78,6 +84,9 @@ def _preprocess(self): ...@@ -78,6 +84,9 @@ def _preprocess(self):
for copy_type in self._args.copy_type: for copy_type in self._args.copy_type:
args += ' --%s_copy' % copy_type args += ' --%s_copy' % copy_type
if self._args.bidirectional:
args += ' --bidirectional'
self._commands = ['%s %s' % (self.__bin_path, args)] self._commands = ['%s %s' % (self.__bin_path, args)]
return True return True
......
...@@ -32,7 +32,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform): ...@@ -32,7 +32,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform):
mem_types = ['htod', 'dtoh', 'dtod'] mem_types = ['htod', 'dtoh', 'dtod']
copy_types = ['sm', 'dma'] copy_types = ['sm', 'dma']
parameters = '--mem_type %s --copy_type %s --size %d --num_loops %d' % \ parameters = '--mem_type %s --copy_type %s --size %d --num_loops %d --bidirectional' % \
(' '.join(mem_types), ' '.join(copy_types), size, num_loops) (' '.join(mem_types), ' '.join(copy_types), size, num_loops)
benchmark = benchmark_class(benchmark_name, parameters=parameters) benchmark = benchmark_class(benchmark_name, parameters=parameters)
...@@ -49,6 +49,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform): ...@@ -49,6 +49,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform):
assert (benchmark._args.copy_type == copy_types) assert (benchmark._args.copy_type == copy_types)
assert (benchmark._args.size == size) assert (benchmark._args.size == size)
assert (benchmark._args.num_loops == num_loops) assert (benchmark._args.num_loops == num_loops)
assert (benchmark._args.bidirectional)
# Check command # Check command
assert (1 == len(benchmark._commands)) assert (1 == len(benchmark._commands))
...@@ -59,6 +60,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform): ...@@ -59,6 +60,7 @@ def _test_gpu_copy_bw_performance_command_generation(self, platform):
assert ('--%s_copy' % copy_type in benchmark._commands[0]) assert ('--%s_copy' % copy_type in benchmark._commands[0])
assert ('--size %d' % size in benchmark._commands[0]) assert ('--size %d' % size in benchmark._commands[0])
assert ('--num_loops %d' % num_loops in benchmark._commands[0]) assert ('--num_loops %d' % num_loops in benchmark._commands[0])
assert ('--bidirectional' in benchmark._commands[0])
@decorator.cuda_test @decorator.cuda_test
def test_gpu_copy_bw_performance_command_generation_cuda(self): def test_gpu_copy_bw_performance_command_generation_cuda(self):
...@@ -70,7 +72,8 @@ def test_gpu_copy_bw_performance_command_generation_rocm(self): ...@@ -70,7 +72,8 @@ def test_gpu_copy_bw_performance_command_generation_rocm(self):
"""Test gpu-copy benchmark command generation, ROCm case.""" """Test gpu-copy benchmark command generation, ROCm case."""
self._test_gpu_copy_bw_performance_command_generation(Platform.ROCM) self._test_gpu_copy_bw_performance_command_generation(Platform.ROCM)
def _test_gpu_copy_bw_performance_result_parsing(self, platform): @decorator.load_data('tests/data/gpu_copy_bw_performance.log')
def _test_gpu_copy_bw_performance_result_parsing(self, platform, test_raw_output):
"""Test gpu-copy benchmark result parsing.""" """Test gpu-copy benchmark result parsing."""
benchmark_name = 'gpu-copy-bw' benchmark_name = 'gpu-copy-bw'
(benchmark_class, (benchmark_class,
...@@ -85,20 +88,6 @@ def _test_gpu_copy_bw_performance_result_parsing(self, platform): ...@@ -85,20 +88,6 @@ def _test_gpu_copy_bw_performance_result_parsing(self, platform):
assert (benchmark.type == BenchmarkType.MICRO) assert (benchmark.type == BenchmarkType.MICRO)
# Positive case - valid raw output. # Positive case - valid raw output.
test_raw_output = """
cpu_to_gpu0_by_gpu0_using_sm_under_numa0 26.1755
cpu_to_gpu0_by_gpu0_using_dma_under_numa0 26.1894
gpu0_to_cpu_by_gpu0_using_sm_under_numa0 5.72584
gpu0_to_cpu_by_gpu0_using_dma_under_numa0 26.2623
gpu0_to_gpu0_by_gpu0_using_sm_under_numa0 659.275
gpu0_to_gpu0_by_gpu0_using_dma_under_numa0 636.401
cpu_to_gpu0_by_gpu0_using_sm_under_numa1 26.1589
cpu_to_gpu0_by_gpu0_using_dma_under_numa1 26.18
gpu0_to_cpu_by_gpu0_using_sm_under_numa1 5.07597
gpu0_to_cpu_by_gpu0_using_dma_under_numa1 25.2851
gpu0_to_gpu0_by_gpu0_using_sm_under_numa1 656.825
gpu0_to_gpu0_by_gpu0_using_dma_under_numa1 634.203
"""
assert (benchmark._process_raw_result(0, test_raw_output)) assert (benchmark._process_raw_result(0, test_raw_output))
assert (benchmark.return_code == ReturnCode.SUCCESS) assert (benchmark.return_code == ReturnCode.SUCCESS)
......
cpu_to_gpu0_by_sm_under_numa0_uni 26.1736
cpu_to_gpu0_by_dma_under_numa0_uni 26.1878
gpu0_to_cpu_by_sm_under_numa0_uni 5.01589
gpu0_to_cpu_by_dma_under_numa0_uni 21.8659
gpu0_to_gpu0_by_sm_under_numa0_uni 655.759
gpu0_to_gpu0_by_dma_under_numa0_uni 633.325
gpu0_to_gpu1_write_by_sm_under_numa0_uni 250.122
gpu0_to_gpu1_write_by_dma_under_numa0_uni 274.951
gpu0_to_gpu1_read_by_sm_under_numa0_uni 253.563
gpu0_to_gpu1_read_by_dma_under_numa0_uni 264.009
cpu_to_gpu1_by_sm_under_numa0_uni 26.187
cpu_to_gpu1_by_dma_under_numa0_uni 26.207
gpu1_to_cpu_by_sm_under_numa0_uni 5.01132
gpu1_to_cpu_by_dma_under_numa0_uni 21.8635
gpu1_to_gpu0_write_by_sm_under_numa0_uni 249.824
gpu1_to_gpu0_write_by_dma_under_numa0_uni 275.123
gpu1_to_gpu0_read_by_sm_under_numa0_uni 253.469
gpu1_to_gpu0_read_by_dma_under_numa0_uni 264.908
gpu1_to_gpu1_by_sm_under_numa0_uni 658.338
gpu1_to_gpu1_by_dma_under_numa0_uni 631.148
cpu_to_gpu0_by_sm_under_numa1_uni 26.1542
cpu_to_gpu0_by_dma_under_numa1_uni 26.2007
gpu0_to_cpu_by_sm_under_numa1_uni 5.67356
gpu0_to_cpu_by_dma_under_numa1_uni 21.8599
gpu0_to_gpu0_by_sm_under_numa1_uni 656.935
gpu0_to_gpu0_by_dma_under_numa1_uni 631.974
gpu0_to_gpu1_write_by_sm_under_numa1_uni 250.118
gpu0_to_gpu1_write_by_dma_under_numa1_uni 274.778
gpu0_to_gpu1_read_by_sm_under_numa1_uni 253.625
gpu0_to_gpu1_read_by_dma_under_numa1_uni 264.347
cpu_to_gpu1_by_sm_under_numa1_uni 26.1905
cpu_to_gpu1_by_dma_under_numa1_uni 26.2007
gpu1_to_cpu_by_sm_under_numa1_uni 5.67716
gpu1_to_cpu_by_dma_under_numa1_uni 21.8579
gpu1_to_gpu0_write_by_sm_under_numa1_uni 250.064
gpu1_to_gpu0_write_by_dma_under_numa1_uni 274.924
gpu1_to_gpu0_read_by_sm_under_numa1_uni 253.746
gpu1_to_gpu0_read_by_dma_under_numa1_uni 264.256
gpu1_to_gpu1_by_sm_under_numa1_uni 655.623
gpu1_to_gpu1_by_dma_under_numa1_uni 634.062
cpu_to_gpu0_by_sm_under_numa0_bi 8.45975
cpu_to_gpu0_by_dma_under_numa0_bi 36.4282
gpu0_to_gpu0_by_sm_under_numa0_bi 689.063
gpu0_to_gpu0_by_dma_under_numa0_bi 661.7
gpu0_to_gpu1_write_by_sm_under_numa0_bi 427.446
gpu0_to_gpu1_write_by_dma_under_numa0_bi 521.577
gpu0_to_gpu1_read_by_sm_under_numa0_bi 446.835
gpu0_to_gpu1_read_by_dma_under_numa0_bi 503.158
cpu_to_gpu1_by_sm_under_numa0_bi 8.4487
cpu_to_gpu1_by_dma_under_numa0_bi 36.4272
cpu_to_gpu0_by_sm_under_numa1_bi 9.36164
cpu_to_gpu0_by_dma_under_numa1_bi 36.411
gpu0_to_gpu0_by_sm_under_numa1_bi 688.156
gpu0_to_gpu0_by_dma_under_numa1_bi 662.077
gpu0_to_gpu1_write_by_sm_under_numa1_bi 427.033
gpu0_to_gpu1_write_by_dma_under_numa1_bi 521.367
gpu0_to_gpu1_read_by_sm_under_numa1_bi 446.179
gpu0_to_gpu1_read_by_dma_under_numa1_bi 503.843
cpu_to_gpu1_by_sm_under_numa1_bi 9.37368
cpu_to_gpu1_by_dma_under_numa1_bi 36.4128
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment