Unverified Commit 39cd09dc authored by Yongye Zhu's avatar Yongye Zhu Committed by GitHub
Browse files

[Bugfix] use flash attn on sm90 (#22933)


Signed-off-by: default avatarYongye Zhu <zyy1102000@gmail.com>
Co-authored-by: default avatarMichael Goin <mgoin64@gmail.com>
parent 919234fe
...@@ -316,7 +316,7 @@ class CudaPlatformBase(Platform): ...@@ -316,7 +316,7 @@ class CudaPlatformBase(Platform):
# FlashAttention is the default for SM 8.0+ GPUs # FlashAttention is the default for SM 8.0+ GPUs
if cls.has_device_capability(80): if cls.has_device_capability(80):
if has_sink: if has_sink and not cls.is_device_capability(90):
logger.info_once("Using Triton backend on V1 engine.") logger.info_once("Using Triton backend on V1 engine.")
return TRITON_ATTN_VLLM_V1 return TRITON_ATTN_VLLM_V1
if is_default_backend_supported := is_attn_backend_supported( if is_default_backend_supported := is_attn_backend_supported(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment