Release SuperBench v0.12.0 SuperBench 0.12.0 Release Notes =============================== SuperBench Improvements ----------------------- - Optimized cutlass build process for faster builds and smaller binaries. - Improve image build pipeline. - Add support for arm64 builds. - Upgrade pipeline dependencies. - Fix SuperBench installation and code lint issues. - Update Flake8 repository. - Add support for the latest Python versions. - Enhance error handling for `pkg_resources` imports. - Update ROCm image build labels. - Add CUDA 12.8 and CUDA 12.9 support. - Consolidate multi-architecture Docker images. - Upgrade runner OS to latest version. - Fix typos in documentation and code. Micro-benchmark Improvements ---------------------------- - Add general CPU bandwidth and latency benchmarks. - Add nvbandwidth build process and benchmarks. - Add architecture support for 10.0 in gemm-flops. - Add GPU Stream micro benchmark. - Add FP4 GEMM FLOPS support in `cublaslt_gemm` benchmark. - Add Grace CPU support for CPU Stream benchmark. - Revise CPU Stream benchmark. - Fix NUMA error on Grace CPU in gpu-copy benchmark. - Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0. - Fix stderr message in gpu-copy benchmark. - Fix TensorRT inference parsing. - Handle N/A values in nvbandwidth benchmark. - Avoid unintended nvbandwidth function calls in all benchmarks. - Support CUDA arch flag and autotuning in `cublaslt` GEMM. Model-benchmark Improvements ---------------------------- - Add LLaMA-2 model benchmarks. - Add Mixture of Experts model benchmarks. - Add DeepSeek inference benchmark (AMD GPU). Result Analysis --------------- - Enhance logging for diagnosis rule baseline errors. Documentation Updates --------------------- - Update CODEOWNERS file.
This tag has no release notes.