Release SuperBench v0.4.0

SuperBench v0.4.0 Release Notes
===============================

SuperBench Framework
--------------------

__Monitor__

- Add monitor framework for NVIDIA GPU, CPU, memory and disk.

__Data Diagnosis and Analysis__

- Support baseline-based data diagnosis.
- Support basic analysis feature (boxplot figure, outlier detection,
  etc.).

Single-node Validation
----------------------

__Micro Benchmarks__

- CPU Memory Validation (tool: Intel Memory Latency Checker).
- GPU Copy Bandwidth (tool: built by MSRA).
- Add ORT Model on AMD GPU platform.
- Add inference backend TensorRT.
- Add inference backend ORT.

Multi-node Validation
---------------------

__Micro Benchmarks__

- IB Networking validation.
- TCP validation (tool: TCPing).
- GPCNet Validation (tool: GPCNet).

Other Improvement
-----------------

1. Enhancement
   - Add pipeline for AMD docker.
   - Integrate system config info script with SuperBench.
   - Support FP32 mode without TF32.
   - Refine unit test for microbenchmark.
   - Unify metric names for all benchmarks.

2. Document
   - Add benchmark list.
   - Add monitor document.
   - Add data diagnosis document.
This tag has no release notes.