[Perf] Reduce peak memory usage of llama (#10339)

Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>

[Perf] Reduce peak memory usage of llama (#10339)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
b2e0ad3b · Murali Andoorveedu · GitHub · 4a18fd14 · b2e0ad3b
Unverified Commit b2e0ad3b authored Nov 14, 2024 by Murali Andoorveedu Committed by GitHub Nov 15, 2024
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

vllm/model_executor/models/llama.py vllm/model_executor/models/llama.py +2 -2

No files found.
--- a/vllm/model_executor/models/llama.py
+++ b/vllm/model_executor/models/llama.py
@@ -90,8 +90,8 @@ class LlamaMLP(nn.Module):
        self.act_fn = SiluAndMul()

    def forward(self, x):
-        gate_up, _ = self.gate_up_proj(x)
-        x = self.act_fn(gate_up)
+        x, _ = self.gate_up_proj(x)
+        x = self.act_fn(x)
        x, _ = self.down_proj(x)
        return x