Merge pull request #43 from eric-tc-wong/patch-1

Update flash_attention.py

Merge pull request #43 from eric-tc-wong/patch-1
Update flash_attention.py
04fb1985 · Tri Dao · GitHub · 19d12610 · b410d14f · 04fb1985
Unverified Commit 04fb1985 authored Sep 06, 2022 by Tri Dao Committed by GitHub Sep 06, 2022
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

flash_attn/flash_attention.py flash_attn/flash_attention.py +1 -1

No files found.
--- a/flash_attn/flash_attention.py
+++ b/flash_attn/flash_attention.py
@@ -107,7 +107,7 @@ class FlashMHA(nn.Module):
            query, key, value = rearrange(qkv, 'b s (three h d) -> b s three h d', three=3,
                                          h=self.num_heads).unbind(dim=2)
            query, key = self.rotary_emb(query, key, seq_dimension=-3)
-            qkv = torch.stack([query, key, value], dim=2)
+            qkv = torch.stack([query.type(x.dtype), key.type(x.dtype), value], dim=2)
        else:
            qkv = rearrange(qkv, 'b s (three h d) -> b s three h d', three=3, h=self.num_heads)
        context, attn_weights = self.inner_attn(qkv, key_padding_mask=key_padding_mask,