[feat] support gemm_sp for ampere and ada arch (#691)
* [feat] add an example mma atom
* [fix] fix typo naming
* [feat] add a template to enable compilation
* [feat] add print util
* [WIP] pass on single block tile
* [feat] add sm80 metadata layout
* [chore] clean codebase
* [CI] format.sh
* [feat] add sm80 compress utils
* [bugfix] fix C fragment layout
* [refactor] use nvcc version instead of str
* [test] add test cases
* [chore] add a param check
* [chore] format a bit
* [chore] rename func to satisfy PEP 8 and appease gemini
* [chore] add check
* [feat] support sm75 layout && add assertion && chore
* [bug] fix illegal memory access when using two warps over N=32
This could be a missing check related to cutlass 2.x implementation.
Using the cutlass example can't trigger this cause it's bypassed by
padding the input.
For now I think it might be safe to increase the atom size and inve-
sgate in the future.
* [chore] add example
* [chore] format
* [example] update benchmark
* [bugfix] fix namespace and format
* [bugfix] fix incorrect param passing
* [refactor] update variable declaration for clarity in gemm_layouts and gemm_sp
* [Cleanup] Remove unnecessary blank lines in metadata layout functions in gemm_sp.py
* [CI] fix arch
* [example] add torch sparse benchmark
* [misc] polish && add reference && apply review suggestionsi && format
* [CI] format with clang-tidy
* [Cleanup] Format and align template struct definitions in half.hpp, common.h, and gemm_sp_sm80.h
* [Update] Modify CUDA version requirements in test_gemm_sp_sm80 and mark cutlass subproject as dirty
---------
Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
Showing
Please register or sign in to comment