[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache...

[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663) Signed-off-by: Huamin Li <3ericli@gmail.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>

[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache...
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663) Signed-off-by: Huamin Li <3ericli@gmail.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
c3123207 · Huamin Li · GitHub · c981f0ea · c3123207
Unverified Commit c3123207 authored Oct 17, 2025 by Huamin Li Committed by GitHub Oct 17, 2025
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

tests/v1/attention/test_attention_backends.py tests/v1/attention/test_attention_backends.py +3 -2

No files found.
--- a/tests/v1/attention/test_attention_backends.py
+++ b/tests/v1/attention/test_attention_backends.py
@@ -423,13 +423,14 @@ def _test_backend_correctness(
    for backend_name in backend_to_test:
        # FlashAttentionm + FlexAttention:
        #   [2, num_blocks, block_size, num_kv_heads, head_size]
-        # FlashInfer:
+        # FlashInfer + Triton:
        #   [num_blocks, 2, block_size, num_kv_heads, head_size]
        # Select the appropriate KV cache format for each backend
        kv_cache_for_backend = kv_cache
-        if backend_name == _Backend.FLASHINFER:
+        if backend_name in (_Backend.FLASHINFER, _Backend.TRITON_ATTN):
            kv_cache_for_backend = kv_cache.transpose(0, 1)
+        if backend_name == _Backend.FLASHINFER:
            # For FlashInfer default to HND layout and
            kv_cache_for_backend = (
                kv_cache_for_backend.transpose(2, 3).contiguous().transpose(2, 3)