Docs - Update README file on main page (#79)

* Update Readme file on main page

Docs - Update README file on main page (#79)
* Update Readme file on main page
1652524a · TobeyQin · GitHub · c05e173b · 1652524a · 1652524a
Unverified Commit 1652524a authored May 25, 2021 by TobeyQin Committed by GitHub May 25, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 117 additions and 10 deletions

README.md README.md +117 -10

imgs/bar.png imgs/bar.png +0 -0

imgs/superbench_structure.png imgs/superbench_structure.png +0 -0

No files found.
--- a/README.md
+++ b/README.md
 # SuperBenchmark
+[![MIT licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg)](LICENSE)
 [![Lint](https://github.com/microsoft/superbenchmark/workflows/Lint/badge.svg)](https://github.com/microsoft/superbenchmark/actions?query=workflow%3ALint)
 [![Codecov](https://codecov.io/gh/microsoft/superbenchmark/branch/main/graph/badge.svg?token=DDiDLW7pSd)](https://codecov.io/gh/microsoft/superbenchmark)
@@ -9,18 +10,17 @@
 | gpu-unit-test | [![Build Status](https://dev.azure.com/msrasrg/SuperBenchmark/_apis/build/status/cuda-unit-test?branchName=main)](https://dev.azure.com/msrasrg/SuperBenchmark/_build/latest?definitionId=80&branchName=main) |
-SuperBench is a benchmarking and diagnosis tool for AI infrastructure,
+**SuperBench** is a validation and profiling tool for AI infrastructure, which supports:
-which supports:
-* Comprehensive AI infrastructure validation
+* AI infrastructure validation and diagnosis
    * Distributed validation tools to validate hundreds or thousands of servers automatically
    * Consider both raw hardware and E2E model performance with ML workload patterns
-    * Provide a fast and accurate way to detect and locate hardware problems
+    * Build a contract to identify hardware issues
-    * Performance/Quality Gates for hardware and system release
+    * Provide infrastructural-oriented criteria as Performance/Quality Gates for hardware and system release
-* Benchmarking with typical AI workload patterns
+    * Provide detailed performance report and advanced analysis tool  
+* AI workload benchmarking and profiling
    * Provide comprehensive performance comparison between different existing hardware
-    * Give a better understanding for new DL software & hardware
+    * Provide insights for hardware and software co-design
-* Detailed performance analysis and diagnosis
-    * Provide detailed performance report and advanced analysis tool   
 It includes micro-benchmark for primitive computation and communication benchmarking,
 and model-benchmark to measure domain-aware end-to-end deep learning workloads.
@@ -29,6 +29,109 @@ and model-benchmark to measure domain-aware end-to-end deep learning workloads.
 SuperBench is in the early pre-alpha stage for open source, and not ready for general public yet.
 If you want to jump in early, you can try building latest code yourself.
+## SuperBench capabilities, workflow and benchmarking metrics
+The following graphic shows the capabilities provide by SuperBench core framework and its extension.
+<img src="imgs/superbench_structure.png">
+Benchmarking metrics provided by SuperBench are listed as below.
+<table>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+      </td>
+      <td>
+        <b>Micro Benchmark</b>
+        <img src="imgs/bar.png"/>
+      </td>
+      <td>
+        <b>Model Benchmark</b>
+        <img src="imgs/bar.png"/>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td align="center" valign="middle">
+        <b>Metrics</b>
+      </td>
+      <td>
+        <ul><li><b>Computation Benchmark</b></li>
+          <ul><li><b>Kernel Performance</b></li>
+            <ul>
+              <li>GFLOPS</li>
+              <li>TensorCore</li>
+              <li>cuBLAS</li>
+              <li>cuDNN</li>
+            </ul>
+          </ul>
+          <ul><li><b>Kernel Launch Time</b></li>
+            <ul>
+              <li>Kernel_Launch_Event_Time</li>
+              <li>Kernel_Launch_Wall_Time</li>
+            </ul>
+          </ul>
+          <ul><li><b>Operator Performance</b></li>
+            <ul><li>MatMul</li><li>Sharding_MatMul</li></ul>
+          </ul>
+          <ul><li><b>Memory</b></li>
+            <ul><li>H2D_Mem_BW_&lt;GPU ID&gt;</li>
+              <li>H2D_Mem_BW_&lt;GPU ID&gt;</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Communication Benchmark</b></li>
+          <ul><li><b>Device P2P Bandwidth</b></li>
+            <ul><li>P2P_BW_Max</li><li>P2P_BW_Min</li><li>P2P_BW_Avg</li></ul>
+          </ul>
+          <ul><li><b>RDMA</b></li>
+            <ul><li>RDMA_Peak</li><li>RDMA_Avg</li></ul>
+          </ul>
+          <ul><li><b>NCCL</b></li>
+            <ul><li>NCCL_AllReduce</li></ul>
+            <ul><li>NCCL_AllGather</li></ul>
+            <ul><li>NCCL_broadcast</li></ul>
+            <ul><li>NCCL_reduce</li></ul>
+            <ul><li>NCCL_reduce_scatter</li></ul>
+          </ul>
+        </ul>
+        <ul><li><b>Computation-Communication Benchmark</b></li>
+          <ul><li><b>Mul_During_NCCL</b></li><li><b>MatMul_During_NCCL</b></li></ul>
+        </ul>
+        <ul><li><b>Storage Benchmark</b></li>
+          <ul><li><b>Disk</b></li>
+            <ul>
+              <li>Read/Write</li><li>Rand_Read/Rand_Write</li>
+              <li>R/W_Read</li><li>R/W_Write</li><li>Rand_R/W_Read</li><li>Rand_R/W_Write</li>
+            </ul>
+          </ul>
+        </ul>   
+      </td>
+      <td>
+        <ul><li><b>CNN models</b></li>
+          <ul>
+            <li><b>ResNet</b></li>
+              <ul><li>ResNet-50</li><li>ResNet-101</li><li>ResNet-152</li></ul>
+          </ul>
+          <ul>
+            <li><b>DenseNet</b></li>
+              <ul><li>DenseNet-169</li><li>DenseNet-201</li></ul>
+          </ul>
+          <ul>
+            <li><b>VGG</b></li>
+              <ul><li>VGG-11</li><li>VGG-13</li><li>VGG-16</li><li>VGG-19</li></ul>
+          </ul>
+          <ul><li><b>Other CNN models</b></li><ul><li>...</li></ul></ul>
+        </ul>  
+        <ul><li><b>BERT models</b></li>
+          <ul><li><b>BERT</b></li><li><b>BERT_LARGE</b></li></ul>
+        </ul>
+        <ul><li><b>LSTM</b></li></ul>
+        <ul><li><b>GPT-2</b></li></ul>
+      </td>
+    </tr>
+  </tbody>
+</table>
 ## Installation
@@ -127,7 +230,11 @@ Please find more benchmark examples [here](examples/benchmarks/).
 ## Developer Guide
-Follow [Installation using Python](#using-python).
+If you want to develop new feature, please follow below steps to set up development environment.
+### Check Environment
+Follow __[System Requirements](#using-python)__.
 ### Set Up

--- a/imgs/bar.png
+++ b/imgs/bar.png
--- a/imgs/superbench_structure.png
+++ b/imgs/superbench_structure.png