[Perf] Optimize glm4.xv VIT (#37779)

Signed-off-by: Yang <lymailforjob@gmail.com>

[Perf] Optimize glm4.xv VIT (#37779)
Signed-off-by: Yang <lymailforjob@gmail.com>
b0507004 · Yang Liu · GitHub · 5dac719b · b0507004
Unverified Commit b0507004 authored Mar 22, 2026 by Yang Liu Committed by GitHub Mar 22, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 2 deletions

vllm/model_executor/models/glm4_1v.py vllm/model_executor/models/glm4_1v.py +1 -2

No files found.
--- a/vllm/model_executor/models/glm4_1v.py
+++ b/vllm/model_executor/models/glm4_1v.py
@@ -758,11 +758,10 @@ class Glm4vVisionTransformer(nn.Module):
            grid_thw[:, 1] * grid_thw[:, 2], grid_thw[:, 0]
        ).cumsum(dim=0, dtype=torch.int32)
        cu_seqlens = torch.cat([cu_seqlens.new_zeros(1), cu_seqlens])
-        cu_seqlens = cu_seqlens.to(self.device, non_blocking=True)
-
        # pre-compute max_seqlen for attn mask to reduce cuMemcpy operations
        max_seqlen = self.compute_attn_mask_seqlen(cu_seqlens)
        seqlens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
+        cu_seqlens = cu_seqlens.to(self.device, non_blocking=True)
        x = self.embeddings(
            x, seqlens, grid_thw, image_type_ids[:, 0], image_type_ids[:, 1]
        )