• botbw's avatar
    [feat] support gemm_sp for ampere and ada arch (#691) · 0b3683bf
    botbw authored
    
    
    * [feat] add an example mma atom
    
    * [fix] fix typo naming
    
    * [feat] add a template to enable compilation
    
    * [feat] add print util
    
    * [WIP] pass on single block tile
    
    * [feat] add sm80 metadata layout
    
    * [chore] clean codebase
    
    * [CI] format.sh
    
    * [feat] add sm80 compress utils
    
    * [bugfix] fix C fragment layout
    
    * [refactor] use nvcc version instead of str
    
    * [test] add test cases
    
    * [chore] add a param check
    
    * [chore] format a bit
    
    * [chore] rename func to satisfy PEP 8 and appease gemini
    
    * [chore] add check
    
    * [feat] support sm75 layout && add assertion && chore
    
    * [bug] fix illegal memory access when using two warps over N=32
    
    This could be a missing check related to cutlass 2.x implementation.
    Using the cutlass example can't trigger this cause it's bypassed by
    padding the input.
    
    For now I think it might be safe to increase the atom size and inve-
    sgate in the future.
    
    * [chore] add example
    
    * [chore] format
    
    * [example] update benchmark
    
    * [bugfix] fix namespace and format
    
    * [bugfix] fix incorrect param passing
    
    * [refactor] update variable declaration for clarity in gemm_layouts and gemm_sp
    
    * [Cleanup] Remove unnecessary blank lines in metadata layout functions in gemm_sp.py
    
    * [CI] fix arch
    
    * [example] add torch sparse benchmark
    
    * [misc] polish && add reference && apply review suggestionsi && format
    
    * [CI] format with clang-tidy
    
    * [Cleanup] Format and align template struct definitions in half.hpp, common.h, and gemm_sp_sm80.h
    
    * [Update] Modify CUDA version requirements in test_gemm_sp_sm80 and mark cutlass subproject as dirty
    
    ---------
    Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
    0b3683bf
example_gemm_sp.py 5.6 KB