Unverified Commit 38658ec6 authored by Isotr0py's avatar Isotr0py Committed by GitHub
Browse files

[Bugfix][MM encoder] Fix ViT attention backend resolving for Turing GPU (#29614)


Signed-off-by: default avatarIsotr0py <mozf@mail2.sysu.edu.cn>
parent a24ea541
......@@ -264,6 +264,7 @@ class CudaPlatformBase(Platform):
cls, head_size: int, dtype: torch.dtype
) -> "AttentionBackendEnum":
# Try FlashAttention first
if (cc := cls.get_device_capability()) and cc.major >= 8:
try:
backend_class = AttentionBackendEnum.FLASH_ATTN.get_class()
if backend_class.supports_head_size(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment