Release SuperBench v0.10.0 SuperBench 0.10.0 Release Notes =============================== SuperBench Improvements ----------------------- - Support monitoring for AMD GPUs. - Support ROCm 5.7 and ROCm 6.0 dockerfile. - Add MSCCL support for Nvidia GPU. - Fix NUMA domains swap issue in NDv4 topology file. - Add NDv5 topo file. - Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA 12.2. Micro-benchmark Improvements ---------------------------- - Add HPL random generator to gemm-flops with ROCm. - Add DirectXGPURenderFPS benchmark to measure the FPS of rendering simple frames. - Add HWDecoderFPS benchmark to measure the FPS of hardware decoder performance. - Update Docker image for H100 support. - Update MLC version into 3.10 for CUDA/ROCm dockerfile. - Bug fix for GPU Burn test. - Support INT8 in cublaslt function. - Add hipBLASLt function benchmark. - Support cpu-gpu and gpu-cpu in ib-validation. - Support graph mode in NCCL/RCCL benchmarks for latency metrics. - Support cpp implementation in distributed inference benchmark. - Add O2 option for gpu copy ROCm build. - Support different hipblasLt data types in dist inference. - Support in-place in NCCL/RCCL benchmark. - Support data type option in NCCL/RCCL benchmark. - Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs. - Update hipblaslt GEMM metric unit to tflops. - Support FP8 for hipblaslt benchmark. Model Benchmark Improvements ---------------------------- - Change torch.distributed.launch to torchrun. - Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark. Result Analysis --------------- - Support baseline generation from multiple nodes.
This tag has no release notes.