[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in...

[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>

[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in...
[CI/Build][Kernel][AMD] Move extra dim to after load in _fwd_kv_parallel in lighting_attn.py (#29132) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>
e99e4673 · rasmith · GitHub · a42ab317 · e99e4673
Unverified Commit e99e4673 authored Nov 21, 2025 by rasmith Committed by GitHub Nov 21, 2025
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

vllm/model_executor/layers/lightning_attn.py vllm/model_executor/layers/lightning_attn.py +7 -1

No files found.
--- a/vllm/model_executor/layers/lightning_attn.py
+++ b/vllm/model_executor/layers/lightning_attn.py
@@ -198,7 +198,7 @@ def _fwd_kv_parallel(
    )
    # Load the decay factors for the current head and block
-    k_decay_ptr = K_decay + off_h * BLOCK + tl.arange(0, CBLOCK)[None, :]
+    k_decay_ptr = K_decay + off_h * BLOCK + tl.arange(0, CBLOCK)
    kv_index = tl.arange(0, CBLOCK)
@@ -228,6 +228,12 @@ def _fwd_kv_parallel(
        # Load decay factor and compute weighted key-value outer product
        k_decay = tl.load(k_decay_ptr)
+        # NOTE: Need to add the extra dim here due to AMD MLIR lowering error.
+        # Please don't move it back until issue is resolved.
+        # Issue: https://github.com/ROCm/triton/issues/907
+        k_decay = k_decay[None, :]
        kv += tl.dot(k_trans * k_decay, v)
        # Move to the next sub-block