[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for non-cuda platforms (#28447)

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for non-cuda platforms (#28447)
Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
b9ce9a30 · Fanli Lin · GitHub · 4ccffe56 · b9ce9a30
Unverified Commit b9ce9a30 authored Nov 12, 2025 by Fanli Lin Committed by GitHub Nov 12, 2025
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

vllm/model_executor/models/keye.py vllm/model_executor/models/keye.py +7 -0

No files found.
--- a/vllm/model_executor/models/keye.py
+++ b/vllm/model_executor/models/keye.py
@@ -346,6 +346,13 @@ def apply_rotary_pos_emb_flashatt(
        from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb
    elif current_platform.is_rocm():
        from flash_attn.ops.triton.rotary import apply_rotary as apply_rotary_emb
+    else:
+        # For other platforms, use PyTorch fallback
+        from vllm.model_executor.layers.rotary_embedding.common import (
+            apply_rotary_emb_torch,
+        )
+
+        apply_rotary_emb = partial(apply_rotary_emb_torch, is_neox_style=True)

    q_embed = apply_rotary_emb(q.float(), cos.float(), sin.float()).type_as(q)
    k_embed = apply_rotary_emb(k.float(), cos.float(), sin.float()).type_as(k)