[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)

Signed-off-by: sihao.li <sihao.li@intel.com>

[XPU]fallback to TRITON_ATTN on xpu when use float32 dtype (#31762)
Signed-off-by: sihao.li <sihao.li@intel.com>
59fe6f29 · sihao_li · GitHub · e7596371 · 59fe6f29
Unverified Commit 59fe6f29 authored Jan 07, 2026 by sihao_li Committed by GitHub Jan 07, 2026
Show whitespace changes
Inline Side-by-side

Showing with 7 additions and 0 deletions

vllm/platforms/xpu.py vllm/platforms/xpu.py +7 -0

No files found.
--- a/vllm/platforms/xpu.py
+++ b/vllm/platforms/xpu.py
@@ -52,11 +52,18 @@ class XPUPlatform(Platform):
            "only NHD layout is supported by XPU attention kernels."
        )

+        dtype = attn_selector_config.dtype
        if attn_selector_config.use_sparse:
            raise NotImplementedError("Sparse Attention is not supported on XPU.")
        if selected_backend == AttentionBackendEnum.TRITON_ATTN:
            logger.info_once("Using Triton backend.")
            return AttentionBackendEnum.TRITON_ATTN.get_path()
+        elif dtype == torch.float32:
+            logger.warning_once(
+                "Flash Attention on XPU does not support float32 dtype. "
+                "Falling back to Triton Attention backend."
+            )
+            return AttentionBackendEnum.TRITON_ATTN.get_path()
        elif selected_backend == AttentionBackendEnum.FLASH_ATTN:
            logger.info_once("Using Flash Attention backend.")
            return AttentionBackendEnum.FLASH_ATTN.get_path()