[Dev] Add new example for FlashAttention with pipelined execution (#200)
- Introduce `example_gqa_fwd_bshd_wgmma_pipelined.py` demonstrating a pipelined implementation of FlashAttention. - Update sequence length parameter in existing example to 8192 and adjust number of stages for improved performance. - Enhance argument parsing to accommodate new configurations for batch size, heads, and groups.
Showing
Please register or sign in to comment