[Model] Switch to Fused RMSNorm in GLM-4.1V model (#24733)

Signed-off-by: SamitHuang <285365963@qq.com>

[Model] Switch to Fused RMSNorm in GLM-4.1V model (#24733)
Signed-off-by: SamitHuang <285365963@qq.com>
f17c0758 · Samit · GitHub · b0d1213a · f17c0758
Unverified Commit f17c0758 authored Sep 13, 2025 by Samit Committed by GitHub Sep 12, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

vllm/model_executor/models/glm4_1v.py vllm/model_executor/models/glm4_1v.py +3 -2

No files found.
--- a/vllm/model_executor/models/glm4_1v.py
+++ b/vllm/model_executor/models/glm4_1v.py
@@ -419,15 +419,16 @@ class Glm4vVisionBlock(nn.Module):
            max_seqlen: Optional[int] = None,  # Only used for Flash Attention
            seqlens: Optional[list[int]] = None,  # Only used for xFormers
    ) -> torch.Tensor:
-        x = x + self.attn(
+        x_attn = self.attn(
            self.norm1(x),
            cu_seqlens=cu_seqlens,
            rotary_pos_emb=rotary_pos_emb,
            max_seqlen=max_seqlen,
            seqlens=seqlens,
        )
+        x_fused_norm, residual = self.norm2(x, residual=x_attn)
+        x = residual + self.mlp(x_fused_norm)
-        x = x + self.mlp(self.norm2(x))
        return x