Forcing static shapes in loss computation (LSCE) (#876)
Summary: applying non_pad_mask results in dynamic shapes = bad for tpus This is an equivalent loss computation (tested), but tensor shapes are constant (in the case of reduce=True) Pull Request resolved: https://github.com/pytorch/fairseq/pull/876 Differential Revision: D16241621 Pulled By: myleott fbshipit-source-id: 973254b7e0842f2b55817afd66b2a110a566f149
Showing
Please register or sign in to comment