Unverified Commit 7568a282 authored by JartX's avatar JartX Committed by GitHub
Browse files

[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744)


Signed-off-by: default avatarJartX <sagformas@epdcenter.es>
Co-authored-by: default avatarLukas Geiger <lukas.geiger94@gmail.com>
parent 1da3309a
......@@ -428,6 +428,14 @@ class Qwen2_5_VisionAttention(nn.Module):
)
elif self.attn_backend == _Backend.TORCH_SDPA:
# Execute attention entry by entry for speed & less VRAM.
from vllm.platforms import current_platform
# Never remove the next contiguous logic
# Without it, hallucinations occur with the backend
if current_platform.is_rocm():
q = q.contiguous()
k = k.contiguous()
v = v.contiguous()
outputs = []
for i in range(1, len(cu_seqlens)):
start_idx = cu_seqlens[i - 1]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment