• ltqin's avatar
    add split-k GEMM (#59) · 4be7f019
    ltqin authored
    
    
    * add DeviceGemmSplitKXdl
    
    * add file device_gemm_splitk_xdl.hpp
    
    * set c matrix zero
    
    * using atomic
    
    * add all tuning parameter to f32 mkkn
    
    * grid size change to 720
    
    * add tunning parameter for NT
    
    * add tunning parameter for TN
    
    * add tunning parameter for TT
    
    * add m=96tunning parameter
    
    * add lost config
    
    * add element wise operation
    
    * fixed MPerBlock=96
    
    * remove marco for slpitk swtich
    
    * add test
    
    * add new line at the end of device_gemm_xdl_instance.hpp
    
    * remove step hack
    
    * seperate split-k instance files
    
    * add tunning parameters
    
    * change disired grid size to parameters
    
    * remove slice length
    
    * add desiredgridsize parameter to ckProfiler
    
    * add losting file device_gemm_xdl_splitk_instance.hpp
    
    * change desired gride size to kbatch
    
    * format
    
    * format
    
    * clean up
    
    * add selection of device_instances
    
    * clean code
    
    * fix build issue
    Co-authored-by: default avatarltqin <letaoqin@amd.com>
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    4be7f019
profile_gemm.cpp 8.24 KB