1. 11 Feb, 2022 1 commit
    • zjing14's avatar
      Add small tile size for fp16/fp32 and NN layout (#80) · 20a672d0
      zjing14 authored
      
      
      * add DeviceGemmSplitKXdl
      
      * add file device_gemm_splitk_xdl.hpp
      
      * set c matrix zero
      
      * using atomic
      
      * add all tuning parameter to f32 mkkn
      
      * grid size change to 720
      
      * add tunning parameter for NT
      
      * add tunning parameter for TN
      
      * add tunning parameter for TT
      
      * add m=96tunning parameter
      
      * add lost config
      
      * debug
      
      * fix sweep
      
      * add failed tuning params
      
      * fixed sweep logic
      
      * clean
      
      * add padding to M/N for irr tile size
      
      * clean code
      
      * add element wise operation
      
      * fixed MPerBlock=96
      
      * remove marco for slpitk swtich
      
      * add test
      
      * add new line at the end of device_gemm_xdl_instance.hpp
      
      * remove step hack
      
      * seperate split-k instance files
      
      * add tunning parameters
      
      * change disired grid size to parameters
      
      * remove slice length
      
      * add desiredgridsize parameter to ckProfiler
      
      * add losting file device_gemm_xdl_splitk_instance.hpp
      
      * change desired gride size to kbatch
      
      * format
      
      * format
      
      * clean up
      
      * add selection of device_instances
      
      * clean code
      
      * clean code
      
      * add small tile size in fp16 nn
      
      * test for rocm 4.5
      
      * merge develop
      
      * clean
      
      * clean
      
      * clean
      
      * remove no-use code
      
      * add padding switch to device_gemm_xdl
      
      * add padding switch for ksplit fp32
      
      * clean
      
      * clean
      
      * add files
      
      * rename
      
      * Update profiler.cpp
      
      * format
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarltqin <letao.qin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      20a672d0
  2. 03 Feb, 2022 1 commit
    • ltqin's avatar
      add split-k GEMM (#59) · 4be7f019
      ltqin authored
      
      
      * add DeviceGemmSplitKXdl
      
      * add file device_gemm_splitk_xdl.hpp
      
      * set c matrix zero
      
      * using atomic
      
      * add all tuning parameter to f32 mkkn
      
      * grid size change to 720
      
      * add tunning parameter for NT
      
      * add tunning parameter for TN
      
      * add tunning parameter for TT
      
      * add m=96tunning parameter
      
      * add lost config
      
      * add element wise operation
      
      * fixed MPerBlock=96
      
      * remove marco for slpitk swtich
      
      * add test
      
      * add new line at the end of device_gemm_xdl_instance.hpp
      
      * remove step hack
      
      * seperate split-k instance files
      
      * add tunning parameters
      
      * change disired grid size to parameters
      
      * remove slice length
      
      * add desiredgridsize parameter to ckProfiler
      
      * add losting file device_gemm_xdl_splitk_instance.hpp
      
      * change desired gride size to kbatch
      
      * format
      
      * format
      
      * clean up
      
      * add selection of device_instances
      
      * clean code
      
      * fix build issue
      Co-authored-by: default avatarltqin <letaoqin@amd.com>
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
      4be7f019