Unverified Commit 63e9b2d1 authored by Yifan Xiong's avatar Yifan Xiong Committed by GitHub
Browse files

Release - SuperBench v0.6.0 (#409)



**Description**

Cherry-pick bug fixes from v0.6.0 to main.

**Major Revisions**

* Enable latency test in ib traffic validation distributed benchmark (#396)
* Enhance parameter parsing to allow spaces in value (#397)
* Update apt packages in dockerfile (#398)
* Upgrade colorlog for NO_COLOR support (#404)
* Analyzer - Update error handling to support exit code of sb result diagnosis (#403)
* Analyzer - Make baseline file optional in data diagnosis and fix bugs (#399)
* Enhance timeout cleanup to avoid possible hanging (#405)
* Auto generate ibstat file by pssh (#402)
* Analyzer - Format int type and unify empty value to N/A in diagnosis output file (#406)
* Docs - Upgrade version and release note (#407)
* Docs - Fix issues in document (#408)
Co-authored-by: default avatarYang Wang <yangwang1@microsoft.com>
Co-authored-by: default avatarYuting Jiang <yutingjiang@microsoft.com>
parent 733860d7
......@@ -54,7 +54,7 @@ def test_cudnn_functions():
context = BenchmarkRegistry.create_benchmark_context(
'cudnn-function',
platform=Platform.CUDA,
parameters='--num_warmup 10 --num_steps 10 --num_in_step 100 --config_json_str ' + custom_config_str
parameters=f"--num_warmup 10 --num_steps 10 --num_in_step 100 --config_json_str '{custom_config_str}'"
)
assert (BenchmarkRegistry.is_benchmark_context_valid(context))
......
......@@ -178,14 +178,14 @@ def test_ib_traffic_performance(self, mock_gpu):
assert (ret is True)
# Generate config
parameters = '--ib_dev mlx5_0 --iters 2000 --msg_size 33554432 --hostfile hostfile'
parameters = '--ib_dev "$(echo mlx5_0)" --iters 2000 --msg_size 33554432 --hostfile hostfile'
benchmark = benchmark_class(benchmark_name, parameters=parameters)
os.environ['OMPI_COMM_WORLD_SIZE'] = '4'
ret = benchmark._preprocess()
Path('config.txt').unlink()
assert (ret)
expect_command = "ib_validation --cmd_prefix '" + benchmark._args.bin_dir + \
"/ib_write_bw -F -n 2000 -d mlx5_0 -s 33554432 --report_gbits' " + \
"/ib_write_bw -F -n 2000 -d $(echo mlx5_0) -s 33554432 --report_gbits' " + \
f'--timeout 120 --hostfile hostfile --input_config {os.getcwd()}/config.txt'
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
assert (command == expect_command)
......@@ -206,6 +206,17 @@ def test_ib_traffic_performance(self, mock_gpu):
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
assert (command == expect_command)
parameters = '--command ib_read_lat --ib_dev mlx5_0 --iters 2000 --msg_size 33554432 ' + \
'--pattern one-to-one --hostfile hostfile --gpu_dev 0'
mock_gpu.return_value = 'nvidia'
benchmark = benchmark_class(benchmark_name, parameters=parameters)
ret = benchmark._preprocess()
expect_command = "ib_validation --cmd_prefix '" + benchmark._args.bin_dir + \
"/ib_read_lat -F -n 2000 -d mlx5_0 -s 33554432 --report_gbits' " + \
f'--timeout 120 --hostfile hostfile --input_config {os.getcwd()}/config.txt'
command = benchmark._bin_name + benchmark._commands[0].split(benchmark._bin_name)[1]
assert (command == expect_command)
# Custom config
config = ['0,1', '1,0;0,1', '0,1;1,0', '1,0;0,1']
with open('test_config.txt', 'w') as f:
......
......@@ -118,6 +118,11 @@ def test_sb_result_diagnosis(self):
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.
format(dir=test_analyzer_dir) + ' --output-dir outputs/test-diagnosis/ --output-all'
)
self.cmd(
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules_without_baseline.yaml'.
format(dir=test_analyzer_dir) +
' --output-dir outputs/test-diagnosis/ --output-all --output-file-format json'
)
# test invalid output format
self.cmd(
'sb result diagnosis -d {dir}/test_results.jsonl -r {dir}/test_rules.yaml -b {dir}/test_baseline.json'.
......
<table>
<thead>
<tr>
<th>machine</th>
<th>index</th>
<th>Category</th>
<th>Defective Details</th>
<th>kernel-launch/event_overhead:0</th>
......@@ -53,7 +53,7 @@
<td>-1.17%</td>
<td>-4.03%</td>
<td>-1.01%</td>
<td>0.0</td>
<td>0</td>
<td>0.0%</td>
<td>0.0%</td>
<td>1.95%</td>
......@@ -78,7 +78,7 @@
<td>0.78%</td>
<td>-1.17%</td>
<td>1.95%</td>
<td>0.0</td>
<td>0</td>
</tr>
<tr>
<td>sb-validation-03</td>
......@@ -92,7 +92,7 @@
<td>-1.17%</td>
<td>-4.03%</td>
<td>-1.01%</td>
<td>0.0</td>
<td>0</td>
<td>0.0%</td>
<td>0.0%</td>
<td>1.95%</td>
......@@ -101,23 +101,23 @@
<td>-1.95%</td>
<td>1.85%</td>
<td>4.39%</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>nan</td>
<td>1.0</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>1</td>
</tr>
</tbody>
</table>
\ No newline at end of file
This diff is collapsed.
{"Category": "KernelLaunch", "Defective Details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)", "kernel-launch/event_overhead:0": 15.7785234899, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0.0, "kernel-launch/wall_overhead:0": 0.0, "kernel-launch/wall_overhead:1": 0.0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": 0.0, "mem-bw/D2H_Mem_BW:1": 0.012345679, "mem-bw/D2H_Mem_BW:2": 0.0082304527, "mem-bw/D2H_Mem_BW:3": 0.012345679, "mem-bw/D2H_Mem_BW:4": 0.0, "mem-bw/D2H_Mem_BW:5": 0.0, "mem-bw/D2H_Mem_BW:6": -0.0164609053, "mem-bw/D2H_Mem_BW:7": 0.012345679, "mem-bw/H2D_Mem_BW:0": 0.0, "mem-bw/H2D_Mem_BW:1": 0.0078125, "mem-bw/H2D_Mem_BW:2": 0.015625, "mem-bw/H2D_Mem_BW:3": 0.01953125, "mem-bw/H2D_Mem_BW:4": 0.0234375, "mem-bw/H2D_Mem_BW:5": 0.0078125, "mem-bw/H2D_Mem_BW:6": -0.01171875, "mem-bw/H2D_Mem_BW:7": 0.01953125, "mem-bw/return_code": 0.0, "Index": "sb-validation-01"}
{"Category": "FailedTest", "Defective Details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)", "kernel-launch/event_overhead:0": 0.0, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0.0, "kernel-launch/wall_overhead:0": 0.0, "kernel-launch/wall_overhead:1": 0.0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": null, "mem-bw/D2H_Mem_BW:1": null, "mem-bw/D2H_Mem_BW:2": null, "mem-bw/D2H_Mem_BW:3": null, "mem-bw/D2H_Mem_BW:4": null, "mem-bw/D2H_Mem_BW:5": null, "mem-bw/D2H_Mem_BW:6": null, "mem-bw/D2H_Mem_BW:7": null, "mem-bw/H2D_Mem_BW:0": null, "mem-bw/H2D_Mem_BW:1": null, "mem-bw/H2D_Mem_BW:2": null, "mem-bw/H2D_Mem_BW:3": null, "mem-bw/H2D_Mem_BW:4": null, "mem-bw/H2D_Mem_BW:5": null, "mem-bw/H2D_Mem_BW:6": null, "mem-bw/H2D_Mem_BW:7": null, "mem-bw/return_code": 1.0, "Index": "sb-validation-03"}
{"Category": "KernelLaunch", "Defective Details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)", "kernel-launch/event_overhead:0": 15.7785234899, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0, "kernel-launch/wall_overhead:0": 0, "kernel-launch/wall_overhead:1": 0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": 0, "mem-bw/D2H_Mem_BW:1": 0.012345679, "mem-bw/D2H_Mem_BW:2": 0.0082304527, "mem-bw/D2H_Mem_BW:3": 0.012345679, "mem-bw/D2H_Mem_BW:4": 0, "mem-bw/D2H_Mem_BW:5": 0, "mem-bw/D2H_Mem_BW:6": -0.0164609053, "mem-bw/D2H_Mem_BW:7": 0.012345679, "mem-bw/H2D_Mem_BW:0": 0, "mem-bw/H2D_Mem_BW:1": 0.0078125, "mem-bw/H2D_Mem_BW:2": 0.015625, "mem-bw/H2D_Mem_BW:3": 0.01953125, "mem-bw/H2D_Mem_BW:4": 0.0234375, "mem-bw/H2D_Mem_BW:5": 0.0078125, "mem-bw/H2D_Mem_BW:6": -0.01171875, "mem-bw/H2D_Mem_BW:7": 0.01953125, "mem-bw/return_code": 0, "index": "sb-validation-01"}
{"Category": "FailedTest", "Defective Details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)", "kernel-launch/event_overhead:0": 0.0, "kernel-launch/event_overhead:1": -0.0016778523, "kernel-launch/event_overhead:2": -0.0654362416, "kernel-launch/event_overhead:3": -0.0771812081, "kernel-launch/event_overhead:4": -0.0067114094, "kernel-launch/event_overhead:5": -0.0117449664, "kernel-launch/event_overhead:6": -0.0402684564, "kernel-launch/event_overhead:7": -0.0100671141, "kernel-launch/return_code": 0, "kernel-launch/wall_overhead:0": 0, "kernel-launch/wall_overhead:1": 0, "kernel-launch/wall_overhead:2": 0.0194931774, "kernel-launch/wall_overhead:3": 0.022417154, "kernel-launch/wall_overhead:4": 0.0360623782, "kernel-launch/wall_overhead:5": -0.0194931774, "kernel-launch/wall_overhead:6": 0.0185185185, "kernel-launch/wall_overhead:7": 0.0438596491, "mem-bw/D2H_Mem_BW:0": "N/A", "mem-bw/D2H_Mem_BW:1": "N/A", "mem-bw/D2H_Mem_BW:2": "N/A", "mem-bw/D2H_Mem_BW:3": "N/A", "mem-bw/D2H_Mem_BW:4": "N/A", "mem-bw/D2H_Mem_BW:5": "N/A", "mem-bw/D2H_Mem_BW:6": "N/A", "mem-bw/D2H_Mem_BW:7": "N/A", "mem-bw/H2D_Mem_BW:0": "N/A", "mem-bw/H2D_Mem_BW:1": "N/A", "mem-bw/H2D_Mem_BW:2": "N/A", "mem-bw/H2D_Mem_BW:3": "N/A", "mem-bw/H2D_Mem_BW:4": "N/A", "mem-bw/H2D_Mem_BW:5": "N/A", "mem-bw/H2D_Mem_BW:6": "N/A", "mem-bw/H2D_Mem_BW:7": "N/A", "mem-bw/return_code": 1, "index": "sb-validation-03"}
| machine | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
| index | Category | Defective Details | kernel-launch/event_overhead:0 | kernel-launch/event_overhead:1 | kernel-launch/event_overhead:2 | kernel-launch/event_overhead:3 | kernel-launch/event_overhead:4 | kernel-launch/event_overhead:5 | kernel-launch/event_overhead:6 | kernel-launch/event_overhead:7 | kernel-launch/return_code | kernel-launch/wall_overhead:0 | kernel-launch/wall_overhead:1 | kernel-launch/wall_overhead:2 | kernel-launch/wall_overhead:3 | kernel-launch/wall_overhead:4 | kernel-launch/wall_overhead:5 | kernel-launch/wall_overhead:6 | kernel-launch/wall_overhead:7 | mem-bw/D2H_Mem_BW:0 | mem-bw/D2H_Mem_BW:1 | mem-bw/D2H_Mem_BW:2 | mem-bw/D2H_Mem_BW:3 | mem-bw/D2H_Mem_BW:4 | mem-bw/D2H_Mem_BW:5 | mem-bw/D2H_Mem_BW:6 | mem-bw/D2H_Mem_BW:7 | mem-bw/H2D_Mem_BW:0 | mem-bw/H2D_Mem_BW:1 | mem-bw/H2D_Mem_BW:2 | mem-bw/H2D_Mem_BW:3 | mem-bw/H2D_Mem_BW:4 | mem-bw/H2D_Mem_BW:5 | mem-bw/H2D_Mem_BW:6 | mem-bw/H2D_Mem_BW:7 | mem-bw/return_code |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| sb-validation-01 | KernelLaunch | kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05) | 1577.85% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | 0.0% | 1.23% | 0.82% | 1.23% | 0.0% | 0.0% | -1.65% | 1.23% | 0.0% | 0.78% | 1.56% | 1.95% | 2.34% | 0.78% | -1.17% | 1.95% | 0.0 |
| sb-validation-03 | FailedTest | mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0) | 0.0% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0.0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | 1.0 |
| sb-validation-01 | KernelLaunch | kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05) | 1577.85% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | 0.0% | 1.23% | 0.82% | 1.23% | 0.0% | 0.0% | -1.65% | 1.23% | 0.0% | 0.78% | 1.56% | 1.95% | 2.34% | 0.78% | -1.17% | 1.95% | 0 |
| sb-validation-03 | FailedTest | mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0) | 0.0% | -0.17% | -6.54% | -7.72% | -0.67% | -1.17% | -4.03% | -1.01% | 0 | 0.0% | 0.0% | 1.95% | 2.24% | 3.61% | -1.95% | 1.85% | 4.39% | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 1 |
[
{
"index": "sb-validation-01",
"diagnosis/category": "KernelLaunch",
"diagnosis/issue_details": "kernel-launch/event_overhead:0(B/L: 0.0060 VAL: 0.1000 VAR: 1577.85% Rule:lambda x:x>0.05)",
"kernel-launch/event_overhead:0": 15.7785234899,
......@@ -10,24 +11,24 @@
"kernel-launch/event_overhead:5": -0.0117449664,
"kernel-launch/event_overhead:6": -0.0402684564,
"kernel-launch/event_overhead:7": -0.0100671141,
"kernel-launch/return_code": 0.0,
"kernel-launch/wall_overhead:0": 0.0,
"kernel-launch/wall_overhead:1": 0.0,
"kernel-launch/return_code": 0,
"kernel-launch/wall_overhead:0": 0,
"kernel-launch/wall_overhead:1": 0,
"kernel-launch/wall_overhead:2": 0.0194931774,
"kernel-launch/wall_overhead:3": 0.022417154,
"kernel-launch/wall_overhead:4": 0.0360623782,
"kernel-launch/wall_overhead:5": -0.0194931774,
"kernel-launch/wall_overhead:6": 0.0185185185,
"kernel-launch/wall_overhead:7": 0.0438596491,
"mem-bw/D2H_Mem_BW:0": 0.0,
"mem-bw/D2H_Mem_BW:0": 0,
"mem-bw/D2H_Mem_BW:1": 0.012345679,
"mem-bw/D2H_Mem_BW:2": 0.0082304527,
"mem-bw/D2H_Mem_BW:3": 0.012345679,
"mem-bw/D2H_Mem_BW:4": 0.0,
"mem-bw/D2H_Mem_BW:5": 0.0,
"mem-bw/D2H_Mem_BW:4": 0,
"mem-bw/D2H_Mem_BW:5": 0,
"mem-bw/D2H_Mem_BW:6": -0.0164609053,
"mem-bw/D2H_Mem_BW:7": 0.012345679,
"mem-bw/H2D_Mem_BW:0": 0.0,
"mem-bw/H2D_Mem_BW:0": 0,
"mem-bw/H2D_Mem_BW:1": 0.0078125,
"mem-bw/H2D_Mem_BW:2": 0.015625,
"mem-bw/H2D_Mem_BW:3": 0.01953125,
......@@ -35,10 +36,10 @@
"mem-bw/H2D_Mem_BW:5": 0.0078125,
"mem-bw/H2D_Mem_BW:6": -0.01171875,
"mem-bw/H2D_Mem_BW:7": 0.01953125,
"mem-bw/return_code": 0.0,
"Index": "sb-validation-01"
"mem-bw/return_code": 0
},
{
"index": "sb-validation-03",
"diagnosis/category": "FailedTest",
"diagnosis/issue_details": "mem-bw/D2H_Mem_BW:0_miss,mem-bw/D2H_Mem_BW:1_miss,mem-bw/D2H_Mem_BW:2_miss,mem-bw/D2H_Mem_BW:3_miss,mem-bw/D2H_Mem_BW:4_miss,mem-bw/D2H_Mem_BW:5_miss,mem-bw/D2H_Mem_BW:6_miss,mem-bw/D2H_Mem_BW:7_miss,mem-bw/H2D_Mem_BW:0_miss,mem-bw/H2D_Mem_BW:1_miss,mem-bw/H2D_Mem_BW:2_miss,mem-bw/H2D_Mem_BW:3_miss,mem-bw/H2D_Mem_BW:4_miss,mem-bw/H2D_Mem_BW:5_miss,mem-bw/H2D_Mem_BW:6_miss,mem-bw/H2D_Mem_BW:7_miss,mem-bw/return_code(VAL: 1.0000 Rule:lambda x:x>0)",
"kernel-launch/event_overhead:0": 0.0,
......@@ -49,32 +50,31 @@
"kernel-launch/event_overhead:5": -0.0117449664,
"kernel-launch/event_overhead:6": -0.0402684564,
"kernel-launch/event_overhead:7": -0.0100671141,
"kernel-launch/return_code": 0.0,
"kernel-launch/wall_overhead:0": 0.0,
"kernel-launch/wall_overhead:1": 0.0,
"kernel-launch/return_code": 0,
"kernel-launch/wall_overhead:0": 0,
"kernel-launch/wall_overhead:1": 0,
"kernel-launch/wall_overhead:2": 0.0194931774,
"kernel-launch/wall_overhead:3": 0.022417154,
"kernel-launch/wall_overhead:4": 0.0360623782,
"kernel-launch/wall_overhead:5": -0.0194931774,
"kernel-launch/wall_overhead:6": 0.0185185185,
"kernel-launch/wall_overhead:7": 0.0438596491,
"mem-bw/D2H_Mem_BW:0": null,
"mem-bw/D2H_Mem_BW:1": null,
"mem-bw/D2H_Mem_BW:2": null,
"mem-bw/D2H_Mem_BW:3": null,
"mem-bw/D2H_Mem_BW:4": null,
"mem-bw/D2H_Mem_BW:5": null,
"mem-bw/D2H_Mem_BW:6": null,
"mem-bw/D2H_Mem_BW:7": null,
"mem-bw/H2D_Mem_BW:0": null,
"mem-bw/H2D_Mem_BW:1": null,
"mem-bw/H2D_Mem_BW:2": null,
"mem-bw/H2D_Mem_BW:3": null,
"mem-bw/H2D_Mem_BW:4": null,
"mem-bw/H2D_Mem_BW:5": null,
"mem-bw/H2D_Mem_BW:6": null,
"mem-bw/H2D_Mem_BW:7": null,
"mem-bw/return_code": 1.0,
"Index": "sb-validation-03"
"mem-bw/D2H_Mem_BW:0": "N/A",
"mem-bw/D2H_Mem_BW:1": "N/A",
"mem-bw/D2H_Mem_BW:2": "N/A",
"mem-bw/D2H_Mem_BW:3": "N/A",
"mem-bw/D2H_Mem_BW:4": "N/A",
"mem-bw/D2H_Mem_BW:5": "N/A",
"mem-bw/D2H_Mem_BW:6": "N/A",
"mem-bw/D2H_Mem_BW:7": "N/A",
"mem-bw/H2D_Mem_BW:0": "N/A",
"mem-bw/H2D_Mem_BW:1": "N/A",
"mem-bw/H2D_Mem_BW:2": "N/A",
"mem-bw/H2D_Mem_BW:3": "N/A",
"mem-bw/H2D_Mem_BW:4": "N/A",
"mem-bw/H2D_Mem_BW:5": "N/A",
"mem-bw/H2D_Mem_BW:6": "N/A",
"mem-bw/H2D_Mem_BW:7": "N/A",
"mem-bw/return_code": 1
}
]
\ No newline at end of file
VM_hostname vma414bbc00005I
0x0ff08c4321664e96
0ff08c4321664e96
VM_hostname vma414bbc00005J
0x0ff08c43217299f2
0ff08c43217299f2
VM_hostname vma414bbc00005K
0x0ff08c4321729742
0ff08c4321729742
VM_hostname vma414bbc00005L
0x0ff08c4321729986
0ff08c4321729986
VM_hostname vma414bbc00005M
0x1c34da03005baca4
1c34da03005baca4
VM_hostname vma414bbc00005N
0x0ff08c432166275a
0ff08c432166275a
VM_hostname vma414bbc00005O
0x0ff08c4321664b66
0ff08c4321664b66
VM_hostname vma414bbc00005P
0x0ff08c432166274e
0ff08c432166274e
VM_hostname vma414bbc00005Q
0x0ff08c4321664f2a
0ff08c4321664f2a
VM_hostname vma414bbc00005R
0x043f720300e61112
043f720300e61112
---
slug: release-sb-v0.6
title: Releasing SuperBench v0.6
author: Peng Cheng
author_title: SuperBench Team
author_url: https://github.com/cp5555
author_image_url: https://github.com/cp5555.png
tags: [superbench, announcement, release]
---
We are very happy to announce that **SuperBench 0.6.0 version** is officially released today!
You can install and try superbench by following [Getting Started Tutorial](https://microsoft.github.io/superbenchmark/docs/getting-started/installation).
## SuperBench 0.6.0 Release Notes
### SuperBench Improvement
- Support running on host directly without Docker.
- Support running `sb` command inside docker image.
- Support ROCm 5.1.1.
- Support ROCm 5.1.3.
- Fix bugs in data diagnosis.
- Fix cmake and build issues.
- Support automatic configuration yaml selection on Azure VM.
- Refine error message when GPU is not detected.
- Add return code for Timeout.
- Update Dockerfile for NCCL/RCCL version, tag name, and verbose output.
- Support node_num=1 in mpi mode.
- Update Python setup for require packages.
- Enhance parameter parsing to allow spaces in value.
- Support NO_COLOR for SuperBench output.
### Micro-benchmark Improvements
- Fix issues in ib loopback benchmark.
- Fix stability issue in ib loopback benchmark.
### Distributed Benchmark Improvements
- Enhance pair-wise IB benchmark.
- Bug Fix in IB benchmark.
- Support topology-aware IB benchmark.
### Data Diagnosis and Analysis
- Add failure check function in data_diagnosis.py.
- Support JSON and JSONL in Diagnosis.
- Add support to store values of metrics in data diagnosis.
- Support exit code of sb result diagnosis.
- Format int type and unify empty value to N/A in diagnosis output files.
......@@ -101,7 +101,7 @@ module.exports = {
announcementBar: {
id: 'supportus',
content:
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.5">v0.5.0</a> has been released! ' +
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.6">v0.6.0</a> has been released! ' +
'⭐️ If you like SuperBench, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/microsoft/superbenchmark">GitHub</a>! ⭐️',
},
algolia: {
......
{
"name": "superbench-website",
"version": "0.5.0",
"version": "0.6.0",
"lockfileVersion": 1,
"requires": true,
"dependencies": {
......
{
"name": "superbench-website",
"version": "0.5.0",
"version": "0.6.0",
"private": true,
"scripts": {
"docusaurus": "docusaurus",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment