Release SuperBench v0.8.0 SuperBench v0.8.0 Release Notes =============================== SuperBench Improvements ----------------------- - Support SuperBench Executor running on Windows. - Remove fixed rccl version in rocm5.1.x docker file. - Upgrade networkx version to fix installation compatibility issue. - Pin setuptools version to v65.7.0. - Limit ansible_runner version for Python 3.6. - Support cgroup V2 when read system metrics in monitor. - Fix analyzer bug in Python 3.8 due to pandas api change. - Collect real-time GPU power in monitor. - Remove unreachable condition when write host list in mpi mode. - Upgrade Docker image with cuda12.1, nccl 2.17.1-1, hpcx v2.14, and mlc 3.10. - Fix wrong unit of cpu-memory-bw-latency in document. Micro-benchmark Improvements ---------------------------- - Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate. - Add HPL Benchmark for HPC Linpack Benchmark. - Support flexible warmup and non-random data initialization in cublas-benchmark. - Support error tolerance in micro-benchmark for CuDNN function. - Add distributed inference benchmark. - Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm. Model Benchmark Improvements ---------------------------- - Fix torch.dist init issue with multiple models. - Support TE FP8 in BERT/GPT2 model. - Add num_workers configurations in model benchmark.
This tag has no release notes.