• Dan Yao's avatar
    [CK_TILE] FA bwd kernels optimization (#1397) · 79a5d9c1
    Dan Yao authored
    
    
    * tmp save
    
    * fix batch deterministic bugs
    
    * fix group deterministic bugs
    
    * codegen update
    
    * reorder files
    
    * bias support
    
    * hd256 bias support
    
    * bwd smoke test update
    
    * simplify convert dq
    
    * fix hd256 dropout scratch
    
    * do{}while() -> while(){}
    
    * comments
    
    * remove FmhaBwdTilePartitioner
    
    * save clear_tile
    
    * refactor dropout
    
    * code cleanup
    
    * code cleanup
    
    * comments
    
    * fix epilogue problem
    
    * fix fwd dropout
    
    * group convert_dq opt
    
    * fix dq alignment
    
    * Do not store storerandval in bwd for flash attention integration
    
    * fix hd32 error and boost performance
    
    * revert
    
    * Remove duplicated WarpGemm definitions in the policy file
    
    * dropout patch for mrepeat 16*16
    
    * code sync up
    
    * dq_acc stride
    
    * dq_acc stride stuff
    
    * codegen update
    
    * fwd dropout revert
    
    * fix hd128 scratches and boost performance
    
    * receipt 3 for simplified smoke test
    
    * more strides for fa integration
    
    * fix hd64 scratches and boost performance
    
    * non-iglp pipeline for headdim padding cases
    
    * dpad same as dvpad for flash attention integration
    
    * unpadded lse&d for group mode
    
    * Support unpad layout for group lse
    
    * Support unpad lse layout for splitkv
    
    * Fix stride for splitkv kernel
    
    * fix unpadded lse issue in fwd splitkv
    
    * comment
    
    * solve lds read&write conflicts
    
    * rename
    
    * bias rename
    
    * tile index revert
    
    ---------
    
    Co-authored-by: danyao12 <danyao12>
    Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
    Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
    79a5d9c1
gemm.hpp 2.26 KB