• Dan Yao's avatar
    CK Tile FA Training kernels (#1286) · 2cab8d39
    Dan Yao authored
    
    
    * FA fwd dropout
    
    * FA bwd
    
    * epilogue reuse
    
    * CMakeLists update
    
    * [CK_TILE] support alibi (#1269)
    
    * add alibi support
    
    * fix code
    
    * update code based on comment
    
    * Support more hdim
    
    * fix fp8 bias
    
    * support seqlen_k=0 case
    
    * remove unused printf
    
    * fix format
    
    ---------
    Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
    
    * now fwd/bwd can build
    
    * bwd alibi
    
    * add bwd validation stream_config
    
    * update generated filenames
    
    * update bwd kernel launch
    
    * CK_TILE_HOST_DEVICE in philox
    
    * Transpose -> transpose
    
    * format
    
    * format
    
    * format
    
    * Generate the instance for FA required
    
    * format
    
    * fix error in WarpGemm
    
    ---------
    
    Co-authored-by: danyao12 <danyao12>
    Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
    Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
    Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
    Co-authored-by: default avatarJing Zhang <jizhan@amd.com>
    2cab8d39
host.hpp 1.03 KB