• Yu Cheng's avatar
    [Dev] Implement FlashAttention3 Backward (#244) · c264f37f
    Yu Cheng authored
    * [BugFix] Fix bug of missing MBarrierExpectTX
    
    * [Dev] Implement FlashAttention3 Backward
    
    - Added a new example for Flash Attention using pipelined WGMMA, including forward and backward pass implementations.
    - Introduced functions for forward and backward processing, leveraging tilelang for optimized tensor operations.
    - Enhanced the attention mechanism with support for both causal and non-causal configurations.
    - Included command-line arguments for batch size, number of heads, context size, and head dimension for flexibility in testing.
    - Updated GEMM operations to support a new `wg_wait` parameter for improved synchronization in kernel execution.
    c264f37f
gemm.h 1.09 KB