[XPU] Whisper model support on XPU Platform (#25123)

Signed-off-by: chzhang <chaojun.zhang@intel.com>

[XPU] Whisper model support on XPU Platform (#25123)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
3bc18127 · Chaojun Zhang · GitHub · bec060fd · 3bc18127 · 3bc18127
Unverified Commit 3bc18127 authored Sep 18, 2025 by Chaojun Zhang Committed by GitHub Sep 18, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

vllm/attention/layer.py vllm/attention/layer.py +2 -2

vllm/v1/worker/utils.py vllm/v1/worker/utils.py +1 -1

No files found.
--- a/vllm/attention/layer.py
+++ b/vllm/attention/layer.py
@@ -391,8 +391,8 @@ class MultiHeadAttention(nn.Module):
            backend = _Backend.FLASH_ATTN
            use_upstream_fa = True
-        if current_platform.is_rocm():
+        if current_platform.is_rocm() or current_platform.is_xpu():
-            # currently, only torch_sdpa is supported on rocm
+            # currently, only torch_sdpa is supported on rocm/xpu
            self.attn_backend = _Backend.TORCH_SDPA
        else:

--- a/vllm/v1/worker/utils.py
+++ b/vllm/v1/worker/utils.py
@@ -282,7 +282,7 @@ def bind_kv_cache(
            # TODO - analyze where runner_kv_caches is used and the right
            # way to ensure it properly reflects multiple attention layers
            # in the same decoder block.
-            if current_platform.is_cuda():
+            if current_platform.is_cuda() or current_platform.is_xpu():
                # We know that the GPU runner is not impacted by this
                # case. Some test code depends on runner_kv_caches, but
                # not in a way that's impacted by ignoring this.