Commits · 7cd0da996364fab3da1a1c6766ba6612860f5fc5 · OpenDAS / tilelang

10 Oct, 2025 1 commit

[Example] Add support for `bfloat16` and user-defined `sm_scale` in attention sink examples (#924) · 7cd0da99

Tong WU authored Oct 10, 2025



* revert split+sum template for MHA backward

* lint

* Update example_mha_bwd.py

* Update example_mha_bwd_wgmma_pipelined.py

* Refactor attention sink examples to support bf16 and user-defined softmax scale

* fix typos

* Adding compile flags for fast math optimizations and enabling BF16 support in both GQA and MHA backward implementations.

* Update backward configuration for GQA and MHA examples to align with flash attention

* Refactor GQA backward implementation to improve atomic add performance

* Allow for slightly larger numerical error for bf16

* upd readme to show bf16 benchmark results

* lint

* fix ci and lint

* fix comments and lint

* refactor atomic add

---------
Co-authored-by: Lei Wang <34334180+LeiWang1999@users.noreply.github.com>

7cd0da99

26 Sep, 2025 1 commit

[Example] Optimize sink attention forward via swizzled layout and report benchmark results (#885) · bf67fb19

Tong WU authored Sep 27, 2025



* Enhance attention sink examples with swizzled layout and performance metrics

- Added `make_swizzled_layout` annotations for shared tensors in the `flashattn` function across MHA and GQA examples to optimize memory access patterns.
- Updated benchmark outputs to include speedup calculations comparing Triton and TileLang implementations.

* Add README for Attention Sink example with algorithm details and benchmark results

- Introduced a new README.md file for the Attention Sink example, outlining the forward and backward algorithms, including the computation of `dsinks`.
- Provided benchmark results comparing performance metrics of the optimized implementation against Triton, highlighting speedup across various configurations.

* Update README.md for Attention Sink example to include link to Triton implementation

* Update examples/attention_sink/README.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* typo

---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bf67fb19