-
Hongtao Zhang authored
Description Add release note for v0.12.0 # Main Features ## SuperBench Improvement 1. - [x] Update Image Build Pipeline (#659) 2. - [x] Add support for arm64 build (#660) 3. - [x] Upgrade dependency versions in pipeline (#671) 4. - [x] Fix installation and lint issues (#684) 5. - [x] Update Flake8 repo (#683) 6. - [x] Init latest python support. (#687) 7. - [x] Add image build on arm64 arch (#690) 8. - [x] Enhancement of ignoring errors for import pkg_resources (#692) 9. - [x] Update label in the ROCm image build (#693) 10. - [x] Support cuda12.8 for Blackwell arch (#682) 11. - [x] Merge multi-arch image (#696) 12. - [x] Update OS of runner to the latest. (#702) 13. - [x] cuda arch flag for cublaslt (#701) ## Micro-benchmark Improvement 1. - [x] Bug Fix - Fix numa error on grace cpu in gpu-copy (#658) 2. - [x] Dependency - Bump onnxruntime-gpu version from 1.10.0 to 1.12.0 (#663) 3. - [x] Benchmarks: micro benchmarks - add general CPU bandwidth and latency benchmark (#662) 4. - [x] Benchmarks: micro benchmarks - add nvbandwidth build and benchmark (#665 and #669) 5. - [x] Fix stderr message in gpu-copy benchmark (#673) 6. - [x] Add arch support for 10.0 in gemm-flops (#680) 7. - [x] Fix tensorrt-inference parsing (#674) 8. - [x] nvbandwidth benchmark need to handle N/A value (#675) 9. - [x] Avoid Unintended nvbandwidth Function Calls in All Benchmarks (#685) 10. - [x] Add GPU Stream Micro Benchmark (#697) 11. - [x] Cuda arch flag for cublaslt (#701) 12. - [x] Support autotuning in cublaslt gemm (#706) 14. - [x] Add FP4 GEMM FLOPS support for cublaslt_gemm benchmark (#711) 15. - [x] CPU Stream Benchmark Revise (#712) 16. - [x] Add cuda12.9 docker image (#716) 17. - [x] Add Grace CPU support for CPU Stream (#719) ## Model Benchmark Improvement 1. - [x] Add LLaMA-2 Models (#668) 2. - [x] Fix typos in documentation and code files (#686) 3. - [x] Add Mixture of Experts Model (#679) 4. - [ ] Add DeepSeek Training Benchmark 5. - [x] Add DeepSeek Inference Benchmark (AMD GPU) (#713) ## Documentation 1. - [x] Update CODEOWNERS (#670) 2. - [x] Update CODEOWNERS (#718) ## Result Analysis 1. - [x] Enhance logging information for diagnosis rule op baseline errors. (#689)
adbf0357