Docs - Upgrade version and release note (#209)

__Description__ Upgrade version and release note. Closes #95 and #170. __Major Revisions__ * Upgrade package versions * Add release note for v0.3.0

Docs - Upgrade version and release note (#209)
__Description__ Upgrade version and release note. Closes #95 and #170. __Major Revisions__ * Upgrade package versions * Add release note for v0.3.0
15f22e2c · Yifan Xiong · GitHub · 0df916ed · 15f22e2c · 15f22e2c
Unverified Commit 15f22e2c authored Sep 24, 2021 by Yifan Xiong Committed by GitHub Sep 24, 2021
15 changed files
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@
 __SuperBench__ is a validation and profiling tool for AI infrastructure.
-📢 [v0.2.1](https://github.com/microsoft/superbenchmark/releases/tag/v0.2.1) has been released!
+📢 [v0.3.0](https://github.com/microsoft/superbenchmark/releases/tag/v0.3.0) has been released!
 ## _Check [aka.ms/superbench](https://aka.ms/superbench) for more details._

--- a/docs/developer-guides/using-docker.mdx
+++ b/docs/developer-guides/using-docker.mdx
@@ -36,7 +36,10 @@ docker buildx build \
 <TabItem value='rocm'>
 ```bash
-# coming soon
+export DOCKER_BUILDKIT=1
+docker buildx build \
+  --platform linux/amd64 --cache-to type=inline,mode=max \
+  --tag superbench-dev --file dockerfile/rocm4.2-pytorch1.7.0.dockerfile .
 ```
 </TabItem>

--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@@ -57,7 +57,7 @@ You can clone the source from GitHub and build it.
 :::note Note
 You should checkout corresponding tag to use release version, for example,
-`git clone -b v0.2.1 https://github.com/microsoft/superbenchmark`
+`git clone -b v0.3.0 https://github.com/microsoft/superbenchmark`
 :::
 ```bash

--- a/docs/getting-started/run-superbench.md
+++ b/docs/getting-started/run-superbench.md
@@ -27,7 +27,7 @@ sb deploy -f remote.ini --host-password [password]
 :::note Note
 You should deploy corresponding Docker image to use release version, for example,
-`sb deploy -f local.ini -i superbench/superbench:v0.2.1-cuda11.1.1`
+`sb deploy -f local.ini -i superbench/superbench:v0.3.0-cuda11.1.1`
 :::
 ## Run

--- a/docs/superbench-config.mdx
+++ b/docs/superbench-config.mdx
@@ -66,7 +66,7 @@ superbench:
 <TabItem value='example'>
 ```yaml
-version: v0.2
+version: v0.3
 superbench:
  enable: benchmark_1
  var:

--- a/docs/tutorial/container-images.mdx
+++ b/docs/tutorial/container-images.mdx
@@ -29,13 +29,17 @@ available tags are listed below for all stable versions.
 | Tag               | Description                        |
 | ----------------- | ---------------------------------- |
+| v0.3.0-cuda11.1.1 | SuperBench v0.3.0 with CUDA 11.1.1 |
 | v0.2.1-cuda11.1.1 | SuperBench v0.2.1 with CUDA 11.1.1 |
 | v0.2.0-cuda11.1.1 | SuperBench v0.2.0 with CUDA 11.1.1 |
 </TabItem>
 <TabItem value='rocm'>
-  Coming soon.
+| Tag                         | Description                                    |
+| --------------------------- | ---------------------------------------------- |
+| v0.3.0-rocm4.2-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.2, PyTorch 1.7.0 |
+| v0.3.0-rocm4.0-pytorch1.7.0 | SuperBench v0.3.0 with ROCm 4.0, PyTorch 1.7.0 |
 </TabItem>
 </Tabs>
--- a/superbench/__init__.py
+++ b/superbench/__init__.py
@@ -6,5 +6,5 @@
 Provide hardware and software benchmarks for AI systems.
 """
-__version__ = '0.2.1'
+__version__ = '0.3.0'
 __author__ = 'Microsoft'
--- a/superbench/config/amd_mi100_hpe.yaml
+++ b/superbench/config/amd_mi100_hpe.yaml
@@ -3,7 +3,7 @@
 # Server:
 #   - Product: HPE Apollo 6500
-version: v0.2
+version: v0.3
 superbench:
  enable: null
  var:
@@ -52,8 +52,8 @@ superbench:
    gemm-flops:
      <<: *default_local_mode
      parameters:
-        m: 7680 
+        m: 7680
-        n: 8192 
+        n: 8192
        k: 8192
    ib-loopback:
      enable: true

--- a/superbench/config/amd_mi100_z53.yaml
+++ b/superbench/config/amd_mi100_z53.yaml
@@ -4,7 +4,7 @@
 #   - Product: G482-Z53
 #   - Link: https://www.gigabyte.cn/FileUpload/Global/MicroSite/553/G482-Z53.html
-version: v0.2
+version: v0.3
 superbench:
  enable: null
  var:

--- a/superbench/config/azure_ndv4.yaml
+++ b/superbench/config/azure_ndv4.yaml
 # SuperBench Config
-version: v0.2
+version: v0.3
 superbench:
  enable: null
  var:

--- a/superbench/config/default.yaml
+++ b/superbench/config/default.yaml
 # SuperBench Config
-version: v0.2
+version: v0.3
 superbench:
  enable: null
  var:

--- a/website/blog/2021-09-22-release-0-3.md
+++ b/website/blog/2021-09-22-release-0-3.md
+---
+slug: release-sb-v0.3
+title: Releasing SuperBench v0.3
+author: Peng Cheng
+author_title: SuperBench Team
+author_url: https://github.com/cp5555
+author_image_url: https://github.com/cp5555.png
+tags: [superbench, announcement, release]
+---
+We are very happy to announce that **SuperBench 0.3.0 version** is officially released today!
+You can install and try superbench by following [Getting Started Tutorial](https://microsoft.github.io/superbenchmark/docs/getting-started/installation).
+## SuperBench 0.3.0 Release Notes
+### SuperBench Framework
+#### Runner
+- Implement MPI mode.
+#### Benchmarks
+- Support Docker benchmark.
+### Single-node Validation
+#### Micro Benchmarks
+1. Memory (Tool: NVIDIA/AMD Bandwidth Test Tool)
+   | Metrics        | Unit | Description                         |
+   |----------------|------|-------------------------------------|
+   | H2D_Mem_BW_GPU | GB/s | host-to-GPU bandwidth for each GPU  |
+   | D2H_Mem_BW_GPU | GB/s | GPU-to-host bandwidth  for each GPU |
+2. IBLoopback (Tool: PerfTest – Standard RDMA Test Tool)
+   | Metrics  | Unit | Description                                                   |
+   |----------|------|---------------------------------------------------------------|
+   | IB_Write | MB/s | The IB write loopback throughput with different message sizes |
+   | IB_Read  | MB/s | The IB read loopback throughput with different message sizes  |
+   | IB_Send  | MB/s | The IB send loopback throughput with different message sizes  |
+3. NCCL/RCCL (Tool: NCCL/RCCL Tests)
+   | Metrics             | Unit | Description                                                     |
+   |---------------------|------|-----------------------------------------------------------------|
+   | NCCL_AllReduce      | GB/s | The NCCL AllReduce performance with different message sizes     |
+   | NCCL_AllGather      | GB/s | The NCCL AllGather performance with different message sizes     |
+   | NCCL_broadcast      | GB/s | The NCCL Broadcast performance with different message sizes     |
+   | NCCL_reduce         | GB/s | The NCCL Reduce performance with different message sizes        |
+   | NCCL_reduce_scatter | GB/s | The NCCL ReduceScatter performance with different message sizes |
+4. Disk (Tool: FIO – Standard Disk Performance Tool)
+   | Metrics        | Unit | Description                                                                     |
+   |----------------|------|---------------------------------------------------------------------------------|
+   | Seq_Read       | MB/s | Sequential read performance                                                     |
+   | Seq_Write      | MB/s | Sequential write performance                                                    |
+   | Rand_Read      | MB/s | Random read performance                                                         |
+   | Rand_Write     | MB/s | Random write performance                                                        |
+   | Seq_R/W_Read   | MB/s | Read performance in sequential read/write, fixed measurement (read:write = 4:1) |
+   | Seq_R/W_Write  | MB/s | Write performance in sequential read/write (read:write = 4:1)                   |
+   | Rand_R/W_Read  | MB/s | Read performance in random read/write (read:write = 4:1)                        |
+   | Rand_R/W_Write | MB/s | Write performance in random read/write (read:write = 4:1)                       |
+5. H2D/D2H SM Transmission Bandwidth (Tool: MSR-A build)
+   | Metrics       | Unit | Description                                         |
+   |---------------|------|-----------------------------------------------------|
+   | H2D_SM_BW_GPU | GB/s | host-to-GPU bandwidth using GPU kernel for each GPU |
+   | D2H_SM_BW_GPU | GB/s | GPU-to-host bandwidth using GPU kernel for each GPU |
+### AMD GPU Support
+#### Docker Image Support
+- ROCm 4.2 PyTorch 1.7.0
+- ROCm 4.0 PyTorch 1.7.0
+#### Micro Benchmarks
+1. Kernel Launch (Tool: MSR-A build)
+   | Metrics                  | Unit      | Description                                                  |
+   |--------------------------|-----------|--------------------------------------------------------------|
+   | Kernel_Launch_Event_Time | Time (ms) | Dispatch latency measured in GPU time using hipEventRecord() |
+   | Kernel_Launch_Wall_Time  | Time (ms) | Dispatch latency measured in CPU time                        |
+2. GEMM FLOPS (Tool: AMD rocblas-bench Tool)
+   | Metrics  | Unit   | Description                   |
+   |----------|--------|-------------------------------|
+   | FP64     | GFLOPS | FP64 FLOPS without MatrixCore |
+   | FP32(MC) | GFLOPS | TF32 FLOPS with MatrixCore    |
+   | FP16(MC) | GFLOPS | FP16 FLOPS with MatrixCore    |
+   | BF16(MC) | GFLOPS | BF16 FLOPS with MatrixCore    |
+   | INT8(MC) | GOPS   | INT8 FLOPS with MatrixCore    |
+#### E2E Benchmarks
+1. CNN models -- Use PyTorch torchvision models
+   - ResNet: ResNet-50, ResNet-101, ResNet-152
+   - DenseNet: DenseNet-169, DenseNet-201
+   - VGG: VGG-11, VGG-13, VGG-16, VGG-19
+2. BERT -- Use huggingface Transformers
+   - BERT
+   - BERT Large
+3. LSTM -- Use PyTorch
+4. GPT-2 -- Use huggingface Transformers
+### Bug Fix
+- VGG models failed on A100 GPU with batch_size=128
+### Other Improvement
+1. Contribution related
+   - Contribute rule
+   - System information collection
+2. Document
+   - Add release process doc
+   - Add design documents
+   - Add developer guide doc for coding style
+   - Add contribution rules
+   - Add docker image list
+   - Add initial validation results
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -101,7 +101,7 @@ module.exports = {
    announcementBar: {
      id: 'supportus',
      content:
-        '📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.2">v0.2.1</a> has been released! ' +
+        '📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.3">v0.3.0</a> has been released! ' +
        '⭐️ If you like SuperBench, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/microsoft/superbenchmark">GitHub</a>! ⭐️',
    },
    algolia: {

--- a/website/package-lock.json
+++ b/website/package-lock.json
 {
  "name": "superbench-website",
-  "version": "0.2.1",
+  "version": "0.3.0",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {

--- a/website/package.json
+++ b/website/package.json
 {
  "name": "superbench-website",
-  "version": "0.2.1",
+  "version": "0.3.0",
  "private": true,
  "scripts": {
    "docusaurus": "docusaurus",