Fix bug (the returned value has a dimension mismatch) in...

Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE (#1037) Summary: MoE will encounter a dimension mismatch bug when using label-smoothed cross entropy as the criterion, which occurs at [https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/translation_moe.py#L125](url). This is a fix to the bug. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1037 Differential Revision: D16892674 Pulled By: myleott fbshipit-source-id: a73bc03d2280356667d02422d22ad11d968d0c65

Fix bug (the returned value has a dimension mismatch) in...
Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE (#1037) Summary: MoE will encounter a dimension mismatch bug when using label-smoothed cross entropy as the criterion, which occurs at [https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/translation_moe.py#L125](url). This is a fix to the bug. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1037 Differential Revision: D16892674 Pulled By: myleott fbshipit-source-id: a73bc03d2280356667d02422d22ad11d968d0c65
0c75c760 · Chunting Zhou · Facebook Github Bot · 732d15a9 · 0c75c760
Commit 0c75c760 authored Aug 19, 2019 by Chunting Zhou Committed by Facebook Github Bot Aug 19, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

fairseq/criterions/label_smoothed_cross_entropy.py fairseq/criterions/label_smoothed_cross_entropy.py +3 -3

No files found.
--- a/fairseq/criterions/label_smoothed_cross_entropy.py
+++ b/fairseq/criterions/label_smoothed_cross_entropy.py
@@ -16,9 +16,9 @@ def label_smoothed_nll_loss(lprobs, target, epsilon, ignore_index=None, reduce=T
    nll_loss = -lprobs.gather(dim=-1, index=target)
    smooth_loss = -lprobs.sum(dim=-1, keepdim=True)
    if ignore_index is not None:
-        non_pad_mask = target.ne(ignore_index)
+        pad_mask = target.eq(ignore_index)
-        nll_loss = nll_loss[non_pad_mask]
+        nll_loss[pad_mask] = nll_loss[pad_mask] * 0.
-        smooth_loss = smooth_loss[non_pad_mask]
+        smooth_loss[pad_mask] = smooth_loss[pad_mask] * 0.
    else:
        nll_loss = nll_loss.squeeze(-1)
        smooth_loss = smooth_loss.squeeze(-1)