[Example] Optimize sink attention forward via swizzled layout and report benchmark results (#885)
* Enhance attention sink examples with swizzled layout and performance metrics - Added `make_swizzled_layout` annotations for shared tensors in the `flashattn` function across MHA and GQA examples to optimize memory access patterns. - Updated benchmark outputs to include speedup calculations comparing Triton and TileLang implementations. * Add README for Attention Sink example with algorithm details and benchmark results - Introduced a new README.md file for the Attention Sink example, outlining the forward and backward algorithms, including the computation of `dsinks`. - Provided benchmark results comparing performance metrics of the optimized implementation against Triton, highlighting speedup across various configurations. * Update README.md for Attention Sink example to include link to Triton implementation * Update examples/attention_sink/README.md Co-authored-by:gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * typo --------- Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment