"vscode:/vscode.git/clone" did not exist on "cd1f58f4d2dfdac89165d2b2256476d572743554"
Unverified Commit bd78f63a authored by cmdr2's avatar cmdr2 Committed by GitHub
Browse files

Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463)

Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
parent 3ebd2d1f
......@@ -344,11 +344,14 @@ class Attention(nn.Module):
beta=beta,
alpha=self.scale,
)
del baddbmm_input
if self.upcast_softmax:
attention_scores = attention_scores.float()
attention_probs = attention_scores.softmax(dim=-1)
del attention_scores
attention_probs = attention_probs.to(dtype)
return attention_probs
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment