• zjing14's avatar
    Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0
    zjing14 authored
    
    
    * move all arguments into device
    
    * add b2c_tile_map
    
    * add examples
    
    * add SetDeviceKernelArgs
    
    * dedicated fixed_nk solution
    
    * init client api
    
    * add grouped_gemm_bias example
    
    * add a instance
    
    * add instances
    
    * formatting
    
    * fixed cmake
    
    * Update EnableCompilerWarnings.cmake
    
    * Update cmake-ck-dev.sh
    
    * clean; fixed comments
    
    * fixed comment
    
    * add instances for fp32 output
    
    * add instances for fp32 output
    
    * add fp32 out client example
    
    * fixed CI
    
    * init commit for kbatch
    
    * add splitk gridwise
    
    * format
    
    * fixed
    
    * clean deviceop
    
    * clean code
    
    * finish splitk
    
    * fixed instances
    
    * change m_loops to tile_loops
    
    * add setkbatch
    
    * clean code
    
    * add splitK+bias
    
    * add instances
    
    * opt mk_nk instances
    
    * clean examples
    
    * fixed CI
    
    * remove zero
    
    * finished non-zero
    
    * clean
    
    * clean code
    
    * optimized global_barrier
    
    * fixed ci
    
    * fixed CI
    
    * removed AddBias
    
    * format
    
    * fixed CI
    
    * fixed CI
    
    * move 20_grouped_gemm to 21_grouped_gemm
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizha@amd.com>
    f5ec04f0
device_memory.cpp 1.79 KB