[DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 (#10595)

* autoencoder_dc tiling * add tiling and slicing support in SANA pipelines * create variables for padding length because the line becomes too long * add tiling and slicing support in pag SANA pipelines * revert changes to tile size * make style * add vae tiling test * fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 --------- Co-authored-by: Aryan <aryan@huggingface.co>

[DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 (#10595)
* autoencoder_dc tiling * add tiling and slicing support in SANA pipelines * create variables for padding length because the line becomes too long * add tiling and slicing support in pag SANA pipelines * revert changes to tile size * make style * add vae tiling test * fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 --------- Co-authored-by: Aryan <aryan@huggingface.co>
b785ddb6 · Junyu Chen · GitHub · e8114bd0 · b785ddb6
Unverified Commit b785ddb6 authored Jan 16, 2025 by Junyu Chen Committed by GitHub Jan 16, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/diffusers/models/attention_processor.py src/diffusers/models/attention_processor.py +1 -1

No files found.
--- a/src/diffusers/models/attention_processor.py
+++ b/src/diffusers/models/attention_processor.py
@@ -899,7 +899,7 @@ class SanaMultiscaleLinearAttention(nn.Module):
        scores = torch.matmul(key.transpose(-1, -2), query)
        scores = scores.to(dtype=torch.float32)
        scores = scores / (torch.sum(scores, dim=2, keepdim=True) + self.eps)
-        hidden_states = torch.matmul(value, scores)
+        hidden_states = torch.matmul(value, scores.to(value.dtype))
        return hidden_states

    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: