[Models][Qwen3 ViT] Keep `max_seqlen` on CPU to prevent D2H sync (#37139)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

[Models][Qwen3 ViT] Keep `max_seqlen` on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
f9e6db30 · Lukas Geiger · GitHub · d61d2b08 · f9e6db30
Unverified Commit f9e6db30 authored Mar 16, 2026 by Lukas Geiger Committed by GitHub Mar 16, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 0 additions and 1 deletion

vllm/model_executor/models/qwen3_vl.py vllm/model_executor/models/qwen3_vl.py +0 -1

No files found.
--- a/vllm/model_executor/models/qwen3_vl.py
+++ b/vllm/model_executor/models/qwen3_vl.py
@@ -557,7 +557,6 @@ class Qwen3_VisionTransformer(nn.Module):
        max_seqlen = torch.tensor(
            MMEncoderAttention.compute_max_seqlen(self.attn_backend, cu_seqlens),
            dtype=torch.int32,
-            device=self.device,
        )
        cu_seqlens = MMEncoderAttention.maybe_recompute_cu_seqlens(
            self.attn_backend,