"vscode:/vscode.git/clone" did not exist on "1d3120fbaaa014fa4899d560e6a0829bd4f3c83a"
-
botbw authored
* [feat] add an example mma atom * [fix] fix typo naming * [feat] add a template to enable compilation * [feat] add print util * [WIP] pass on single block tile * [feat] add sm80 metadata layout * [chore] clean codebase * [CI] format.sh * [feat] add sm80 compress utils * [bugfix] fix C fragment layout * [refactor] use nvcc version instead of str * [test] add test cases * [chore] add a param check * [chore] format a bit * [chore] rename func to satisfy PEP 8 and appease gemini * [chore] add check * [feat] support sm75 layout && add assertion && chore * [bug] fix illegal memory access when using two warps over N=32 This could be a missing check related to cutlass 2.x implementation. Using the cutlass example can't trigger this cause it's bypassed by padding the input. For now I think it might be safe to increase the atom size and inve- sgate in the future. * [chore] add example * [chore] format * [example] update benchmark * [bugfix] fix namespace and format * [bugfix] fix incorrect param passing * [refactor] update variable declaration for clarity in gemm_layouts and gemm_sp * [Cleanup] Remove unnecessary blank lines in metadata layout functions in gemm_sp.py * [CI] fix arch * [example] add torch sparse benchmark * [misc] polish && add reference && apply review suggestionsi && format * [CI] format with clang-tidy * [Cleanup] Format and align template struct definitions in half.hpp, common.h, and gemm_sp_sm80.h * [Update] Modify CUDA version requirements in test_gemm_sp_sm80 and mark cutlass subproject as dirty --------- Co-authored-by:LeiWang1999 <leiwang1999@outlook.com>
0b3683bf