fix: hard coding for max number

fp16 max number is 65504, the original 1e30 will cause Nan in fp16

fix: hard coding for max number
fp16 max number is 65504, the original 1e30 will cause Nan in fp16
6060b2f8 · ziliwang · GitHub · caf1d116 · 6060b2f8
Unverified Commit 6060b2f8 authored Aug 30, 2019 by ziliwang Committed by GitHub Aug 30, 2019
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 1 deletion

pytorch_transformers/modeling_xlnet.py pytorch_transformers/modeling_xlnet.py +4 -1

No files found.
--- a/pytorch_transformers/modeling_xlnet.py
+++ b/pytorch_transformers/modeling_xlnet.py
@@ -418,6 +418,9 @@ class XLNetRelativeAttention(nn.Module):
        attn_score = (ac + bd + ef) * self.scale
        if attn_mask is not None:
            # attn_score = attn_score * (1 - attn_mask) - 1e30 * attn_mask
+            if attn_mask.dtype == torch.float16:
+                attn_score = attn_score - 65500 * attn_mask
+            else:
                attn_score = attn_score - 1e30 * attn_mask
        # attention probability