• Yu Cheng's avatar
    [Dev] Add new example for FlashAttention with pipelined execution (#200) · c2b9b59d
    Yu Cheng authored
    - Introduce `example_gqa_fwd_bshd_wgmma_pipelined.py` demonstrating a pipelined implementation of FlashAttention.
    - Update sequence length parameter in existing example to 8192 and adjust number of stages for improved performance.
    - Enhance argument parsing to accommodate new configurations for batch size, heads, and groups.
    c2b9b59d
example_gqa_fwd_bshd.py 10.1 KB