Initializing mask as a tensor of ints (not long) (#875)

Summary: Since mask really is a tensor of ints, this change should be mathematically equivalent to the base. On the other hand, this has performance implications for xla, hence the pull request. Pull Request resolved: https://github.com/pytorch/fairseq/pull/875 Differential Revision: D16232877 Pulled By: myleott fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84

Initializing mask as a tensor of ints (not long) (#875)
Summary: Since mask really is a tensor of ints, this change should be mathematically equivalent to the base. On the other hand, this has performance implications for xla, hence the pull request. Pull Request resolved: https://github.com/pytorch/fairseq/pull/875 Differential Revision: D16232877 Pulled By: myleott fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84
af6b361c · Taylan Bilal · Facebook Github Bot · 30123e2c · af6b361c
Commit af6b361c authored Jul 23, 2019 by Taylan Bilal Committed by Facebook Github Bot Jul 23, 2019
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 2 deletions

fairseq/utils.py fairseq/utils.py +8 -2

No files found.
--- a/fairseq/utils.py
+++ b/fairseq/utils.py
@@ -172,8 +172,14 @@ def make_positions(tensor, padding_idx, onnx_trace=False):

    Position numbers begin at padding_idx+1. Padding symbols are ignored.
    """
-    mask = tensor.ne(padding_idx).long()
-    return torch.cumsum(mask, dim=1) * mask + padding_idx
+    # The series of casts and type-conversions here are carefully
+    # balanced to both work with ONNX export and XLA. In particular XLA
+    # prefers ints, cumsum defaults to output longs, and ONNX doesn't know
+    # how to handle the dtype kwarg in cumsum.
+    mask = tensor.ne(padding_idx).int()
+    return (
+        torch.cumsum(mask, dim=1).type_as(mask) * mask
+    ).long() + padding_idx


 def strip_pad(tensor, pad):