Unverified Commit bd78f63a authored by cmdr2's avatar cmdr2 Committed by GitHub
Browse files

Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) (#3463)

Release large tensors in attention (as soon as they're no longer required). Reduces peak VRAM by nearly 2 GB for 1024x1024 (even after slicing), and the savings scale up with image size.
parent 3ebd2d1f
......@@ -344,11 +344,14 @@ class Attention(nn.Module):
beta=beta,
alpha=self.scale,
)
del baddbmm_input
if self.upcast_softmax:
attention_scores = attention_scores.float()
attention_probs = attention_scores.softmax(dim=-1)
del attention_scores
attention_probs = attention_probs.to(dtype)
return attention_probs
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment