[Feature] Add GQA backward kernel with varlen input (#1082)

* [Feature] Add GQA backward kernel with varlen input * [Lint] * [BugFix] Freeze the memory order of all atomic_add operations * [Lint] * [Lint] * [BugFix] Use release order to boost performance

[Feature] Add GQA backward kernel with varlen input (#1082)
* [Feature] Add GQA backward kernel with varlen input * [Lint] * [BugFix] Freeze the memory order of all atomic_add operations * [Lint] * [Lint] * [BugFix] Use release order to boost performance
792e5d5b · Zhengju Tang · GitHub · bb8b3cd7 · 792e5d5b · 792e5d5b
Unverified Commit 792e5d5b authored Oct 21, 2025 by Zhengju Tang Committed by GitHub Oct 21, 2025
2 changed files
--- a/examples/flash_attention/example_gqa_bwd_tma_reduce_varlen.py
+++ b/examples/flash_attention/example_gqa_bwd_tma_reduce_varlen.py
--- a/examples/flash_attention/example_gqa_fwd_varlen.py
+++ b/examples/flash_attention/example_gqa_fwd_varlen.py
@@ -8,8 +8,6 @@ from einops import rearrange, repeat
 from tilelang.profiler import do_bench
 from varlen_utils import generate_random_padding_mask, generate_qkv

-tilelang.disable_cache()
-

 def attention_ref(
        q,