"...git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "9a38fab5aed49b4edd77d7bb8e4705a88269d4b9"
[Dev] Implement FlashAttention3 Backward (#244)
* [BugFix] Fix bug of missing MBarrierExpectTX * [Dev] Implement FlashAttention3 Backward - Added a new example for Flash Attention using pipelined WGMMA, including forward and backward pass implementations. - Introduced functions for forward and backward processing, leveraging tilelang for optimized tensor operations. - Enhanced the attention mechanism with support for both causal and non-causal configurations. - Included command-line arguments for batch size, number of heads, context size, and head dimension for flexibility in testing. - Updated GEMM operations to support a new `wg_wait` parameter for improved synchronization in kernel execution.
Showing
Please register or sign in to comment