"tests/vscode:/vscode.git/clone" did not exist on "6255c95aae718241d1bac3531d4c44d7cc07993a"
Commit 8db7b1c7 authored by Taylan Bilal's avatar Taylan Bilal Committed by Facebook Github Bot
Browse files

Forcing static shapes in loss computation (LSCE) (#876)

Summary:
applying non_pad_mask results in dynamic shapes = bad for tpus
This is an equivalent loss computation (tested), but tensor shapes are
constant (in the case of reduce=True)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/876

Differential Revision: D16241621

Pulled By: myleott

fbshipit-source-id: 973254b7e0842f2b55817afd66b2a110a566f149
parent c38b1f91
......@@ -52,11 +52,14 @@ class LabelSmoothedCrossEntropyCriterion(FairseqCriterion):
lprobs = lprobs.view(-1, lprobs.size(-1))
target = model.get_targets(sample, net_output).view(-1, 1)
non_pad_mask = target.ne(self.padding_idx)
nll_loss = -lprobs.gather(dim=-1, index=target)[non_pad_mask]
smooth_loss = -lprobs.sum(dim=-1, keepdim=True)[non_pad_mask]
if reduce:
nll_loss = -lprobs.gather(dim=-1, index=target).masked_fill_(1.0-non_pad_mask, 0.0)
nll_loss = nll_loss.sum()
smooth_loss = -lprobs.sum(dim=-1, keepdim=True).masked_fill_(1.0-non_pad_mask, 0.0)
smooth_loss = smooth_loss.sum()
else:
nll_loss = -lprobs.gather(dim=-1, index=target)[non_pad_mask]
smooth_loss = -lprobs.sum(dim=-1, keepdim=True)[non_pad_mask]
eps_i = self.eps / lprobs.size(-1)
loss = (1. - self.eps) * nll_loss + eps_i * smooth_loss
return loss, nll_loss
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment