Release SuperBench v0.7.0

SuperBench v0.7.0 Release Notes
===============================

SuperBench Improvements
-----------------------

- Support non-zero return code when "sb deploy" or "sb run" fails in
  Ansible.
- Support log flushing to the result file during runtime.
- Update version to include revision hash and date.
- Support "pattern" in mpi mode to run tasks in parallel.
- Support topo-aware, all-pair, and K-batch pattern in mpi mode.
- Fix Transformers version to avoid Tensorrt failure.
- Add CUDA11.8 Docker image for NVIDIA arch90 GPUs.
- Support "sb deploy" without pulling image.

Micro-benchmark Improvements
----------------------------

- Support list of custom config string in cudnn-functions and
  cublas-functions.
- Support correctness check in cublas-functions.
- Support GEMM-FLOPS for NVIDIA arch90 GPUs.
- Support cuBLASLt FP16 and FP8 GEMM.
- Add wait time option to resolve mem-bw unstable issue.
- Fix bug for incorrect datatype judgement in cublas-function source
  code.

Model Benchmark Improvements
----------------------------

- Support FP8 in BERT model training.

Distributed Benchmark Improvements
----------------------------------

- Support pair-wise pattern in IB validation benchmark.
- Support topo-aware, pair-wise, and K-batch pattern in nccl-bw
  benchmark.
This tag has no release notes.