v0.10.0 · Tags · tsoc / superbenchmark

v0.10.0
9eb2bdfb · Docs - Upgrade version and release note (#606) · Dec 28, 2023
Release SuperBench v0.10.0

SuperBench 0.10.0 Release Notes
===============================

SuperBench Improvements
-----------------------

- Support monitoring for AMD GPUs.
- Support ROCm 5.7 and ROCm 6.0 dockerfile.
- Add MSCCL support for Nvidia GPU.
- Fix NUMA domains swap issue in NDv4 topology file.
- Add NDv5 topo file.
- Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2.

Micro-benchmark Improvements
----------------------------

- Add HPL random generator to gemm-flops with ROCm.
- Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames.
- Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance.
- Update Docker image for H100 support.
- Update MLC version into 3.10 for CUDA/ROCm dockerfile.
- Bug fix for GPU Burn test.
- Support INT8 in cublaslt function.
- Add hipBLASLt function benchmark.
- Support cpu-gpu and gpu-cpu in ib-validation.
- Support graph mode in NCCL/RCCL benchmarks for latency metrics.
- Support cpp implementation in distributed inference benchmark.
- Add O2 option for gpu copy ROCm build.
- Support different hipblasLt data types in dist inference.
- Support in-place in NCCL/RCCL benchmark.
- Support data type option in NCCL/RCCL benchmark.
- Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs.
- Update hipblaslt GEMM metric unit to tflops.
- Support FP8 for hipblaslt benchmark.

Model Benchmark Improvements
----------------------------

- Change torch.distributed.launch to torchrun.
- Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark.

Result Analysis
---------------

- Support baseline generation from multiple nodes.
This tag has no release notes.
Download source code