• zjing14's avatar
    Add small tile size for fp16/fp32 and NN layout (#80) · 20a672d0
    zjing14 authored
    
    
    * add DeviceGemmSplitKXdl
    
    * add file device_gemm_splitk_xdl.hpp
    
    * set c matrix zero
    
    * using atomic
    
    * add all tuning parameter to f32 mkkn
    
    * grid size change to 720
    
    * add tunning parameter for NT
    
    * add tunning parameter for TN
    
    * add tunning parameter for TT
    
    * add m=96tunning parameter
    
    * add lost config
    
    * debug
    
    * fix sweep
    
    * add failed tuning params
    
    * fixed sweep logic
    
    * clean
    
    * add padding to M/N for irr tile size
    
    * clean code
    
    * add element wise operation
    
    * fixed MPerBlock=96
    
    * remove marco for slpitk swtich
    
    * add test
    
    * add new line at the end of device_gemm_xdl_instance.hpp
    
    * remove step hack
    
    * seperate split-k instance files
    
    * add tunning parameters
    
    * change disired grid size to parameters
    
    * remove slice length
    
    * add desiredgridsize parameter to ckProfiler
    
    * add losting file device_gemm_xdl_splitk_instance.hpp
    
    * change desired gride size to kbatch
    
    * format
    
    * format
    
    * clean up
    
    * add selection of device_instances
    
    * clean code
    
    * clean code
    
    * add small tile size in fp16 nn
    
    * test for rocm 4.5
    
    * merge develop
    
    * clean
    
    * clean
    
    * clean
    
    * remove no-use code
    
    * add padding switch to device_gemm_xdl
    
    * add padding switch for ksplit fp32
    
    * clean
    
    * clean
    
    * add files
    
    * rename
    
    * Update profiler.cpp
    
    * format
    Co-authored-by: default avatarltqin <letaoqin@amd.com>
    Co-authored-by: default avatarltqin <letao.qin@amd.com>
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    20a672d0
gemm_specialization.hpp 260 Bytes