Release SuperBench v0.5.0

SuperBench v0.5.0 Release Notes
===============================

Micro-benchmark Improvements
----------------------------

- Support NIC only NCCL bandwidth benchmark on single node in NCCL/RCCL
  bandwidth test.
- Support bi-directional bandwidth benchmark in GPU copy bandwidth test.
- Support data checking in GPU copy bandwidth test.
- Update rccl-tests submodule to fix divide by zero error.
- Add GPU-Burn micro-benchmark.

Model-benchmark Improvements
----------------------------

- Sync results on root rank for e2e model benchmarks in distributed mode.
- Support customized `env` in local and torch.distributed mode.
- Add support for pytorch>=1.9.0.
- Keep BatchNorm as fp32 for pytorch cnn models cast to fp16.
- Remove FP16 samples type converting time.
- Support FAMBench.

Inference Benchmark Improvements
--------------------------------

- Revise the default setting for inference benchmark.
- Add percentile metrics for inference benchmarks.
- Support T4 and A10 in GEMM benchmark.
- Add configuration with inference benchmark.

Other Improvements
------------------

- Add command to support listing all optional parameters for benchmarks.
- Unify benchmark naming convention and support multiple tests with same
  benchmark and different parameters/options in one configuration file.
- Support timeout to detect the benchmark failure and stop the process
  automatically.
- Add rocm5.0 dockerfile.
- Improve output interface.

Data Diagnosis and Analysis
---------------------------

- Support multi-benchmark check.
- Support result summary in md, html and excel formats.
- Support data diagnosis in md and html formats.
- Support result output for all nodes in data diagnosis.
This tag has no release notes.