• Po Yen Chen's avatar
    [CK_TILE] More fmha splitkv optimizations (#1588) · 54f0e6f4
    Po Yen Chen authored
    * Use pre-defined constants for readability
    
    * Use vector write for o_acc tensor
    
    * Remove no-longer used policy method
    
    * Deprecate no-longer used policy/pipeline
    
    * Specify gemm0/gemm1 block warps separately in codegen
    
    * Fix wrong ps_idx creation logic
    
    * Add single-warp block gemm
    
    * Supoprt single-warp gemm0
    
    * Make MakeCBlockTile() as static method
    
    * Use MakeCBlockTile() to get underlying tile distribution
    
    * Use kNumGemm1Warps to compute # threads for gemm1
    
    * Put normal case in the if clause
    
    * Refine fmha splitkv block mapping
    
    * Refine & fix the lse_acc/o_acc layout
    
    * Fix wrong LDS size for K tile
    
    * Use kK0=64 for hdim=128,256 fmha splitkv kernels
    
    * Use kK1=64 for hdim=32,64,128 fmha splitkv kernels
    
    * Undo kK0/kK1 changes
    
    * Use more reasonable GetAlignmentV() computation
    
    * Using store_tile() in fmha splitkv kernel epilogue
    54f0e6f4
gemm.hpp 2.55 KB