Unverified Commit 0b4311cd authored by Hongtao Zhang's avatar Hongtao Zhang Committed by GitHub
Browse files

Release - SuperBench v0.12.0 (#729)



**Description**

Cherry-pick bug fixes from v0.12.0 to main.

**Major Revisions**

* #725
* #727
* #728
Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
Co-authored-by: default avatarYifan Xiong <yixio@microsoft.com>
Co-authored-by: default avatarGuoshuai Zhao <guzhao@microsoft.com>

---------
Co-authored-by: default avatarHongtao Zhang <hongtaozhang@microsoft.com>
parent 44e35cda
---
slug: release-sb-v0.12
title: Releasing SuperBench v0.12
author: Guoshuai Zhao
author_title: SuperBench Team
author_url: https://github.com/guoshzhao
author_image_url: https://github.com/guoshzhao.png
tags: [superbench, announcement, release]
---
We are very happy to announce that **SuperBench 0.12.0 version** is officially released today!
You can install and try SuperBench by following the [Getting Started Tutorial](https://microsoft.github.io/superbenchmark/docs/getting-started/installation).
## SuperBench 0.12.0 Release Notes
### SuperBench Improvements
- Optimized cutlass build process for faster builds and smaller binaries.
- Improve image build pipeline.
- Add support for arm64 builds.
- Upgrade pipeline dependencies.
- Fix SuperBench installation and code lint issues.
- Update Flake8 repository.
- Add support for the latest Python versions.
- Enhance error handling for `pkg_resources` imports.
- Update ROCm image build labels.
- Add CUDA 12.8 and CUDA 12.9 support.
- Consolidate multi-architecture Docker images.
- Upgrade runner OS to latest version.
### Micro-benchmark Improvements
- Add general CPU bandwidth and latency benchmarks.
- Add nvbandwidth build process and benchmarks.
- Add architecture support for 10.0 in gemm-flops.
- Add GPU Stream micro benchmark.
- Add FP4 GEMM FLOPS support in `cublaslt_gemm` benchmark.
- Add Grace CPU support for CPU Stream benchmark.
- Revise CPU Stream benchmark.
- Fix NUMA error on Grace CPU in gpu-copy benchmark.
- Bump onnxruntime-gpu dependency from 1.10.0 to 1.12.0.
- Fix stderr message in gpu-copy benchmark.
- Fix TensorRT inference parsing.
- Handle N/A values in nvbandwidth benchmark.
- Avoid unintended nvbandwidth function calls in all benchmarks.
- Support CUDA arch flag and autotuning in `cublaslt` GEMM.
## Model-benchmark Improvements
- Add LLaMA-2 model benchmarks.
- Add Mixture of Experts model benchmarks.
- Add DeepSeek inference benchmark (AMD GPU).
- Fix typos in documentation and code.
### Result Analysis
- Enhance logging for diagnosis rule baseline errors.
### Documentation Updates
- Update CODEOWNERS file.
......@@ -101,7 +101,7 @@ module.exports = {
announcementBar: {
id: 'supportus',
content:
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.11">v0.11.0</a> has been released! ' +
'📢 <a href="https://microsoft.github.io/superbenchmark/blog/release-sb-v0.11">v0.12.0</a> has been released! ' +
'⭐️ If you like SuperBench, give it a star on <a target="_blank" rel="noopener noreferrer" href="https://github.com/microsoft/superbenchmark">GitHub</a>! ⭐️',
},
algolia: {
......
{
"name": "superbench-website",
"version": "0.11.0",
"version": "0.12.0",
"lockfileVersion": 1,
"requires": true,
"dependencies": {
......
{
"name": "superbench-website",
"version": "0.11.0",
"version": "0.12.0",
"private": true,
"scripts": {
"docusaurus": "docusaurus",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment