Commits · bf67fb19fc9c22e48ce18b0a43bec49fd5cb3d7d · OpenDAS / tilelang

26 Sep, 2025 1 commit

[Example] Optimize sink attention forward via swizzled layout and report benchmark results (#885) · bf67fb19

Tong WU authored Sep 27, 2025



* Enhance attention sink examples with swizzled layout and performance metrics

- Added `make_swizzled_layout` annotations for shared tensors in the `flashattn` function across MHA and GQA examples to optimize memory access patterns.
- Updated benchmark outputs to include speedup calculations comparing Triton and TileLang implementations.

* Add README for Attention Sink example with algorithm details and benchmark results

- Introduced a new README.md file for the Attention Sink example, outlining the forward and backward algorithms, including the computation of `dsinks`.
- Provided benchmark results comparing performance metrics of the optimized implementation against Triton, highlighting speedup across various configurations.

* Update README.md for Attention Sink example to include link to Triton implementation

* Update examples/attention_sink/README.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* typo

---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bf67fb19