Faster masking in MultiheadAttention

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/612 Differential Revision: D15541377 Pulled By: myleott fbshipit-source-id: 4762516a3b545d03bc81d3660f47827e15466dce

Faster masking in MultiheadAttention
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/612 Differential Revision: D15541377 Pulled By: myleott fbshipit-source-id: 4762516a3b545d03bc81d3660f47827e15466dce
b18a3126 · Myle Ott · Facebook Github Bot · c97978a2 · b18a3126
Commit b18a3126 authored May 29, 2019 by Myle Ott Committed by Facebook Github Bot May 29, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

fairseq/modules/multihead_attention.py fairseq/modules/multihead_attention.py +2 -2

No files found.
--- a/fairseq/modules/multihead_attention.py
+++ b/fairseq/modules/multihead_attention.py
@@ -201,10 +201,10 @@ class MultiheadAttention(nn.Module):
                    attn_weights.float()
                ).type_as(attn_weights)
            else:
-                attn_weights = attn_weights.float().masked_fill(
+                attn_weights = attn_weights.masked_fill(
                    key_padding_mask.unsqueeze(1).unsqueeze(2),
                    float('-inf'),
-                ).type_as(attn_weights)  # FP16 support: cast to float and back
+                )
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
        attn_weights = utils.softmax(