[fsmt] deal with -100 indices in decoder ids (#18592)

* [fsmt] deal with -100 indices in decoder ids Fixes: https://github.com/huggingface/transformers/issues/17945 decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index. For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems. Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive. * style

[fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids Fixes: https://github.com/huggingface/transformers/issues/17945 decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index. For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems. Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive. * style
b3ff7c68 · Stas Bekman · GitHub · 37c59918 · b3ff7c68
Unverified Commit b3ff7c68 authored Aug 12, 2022 by Stas Bekman Committed by GitHub Aug 12, 2022
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

src/transformers/models/fsmt/modeling_fsmt.py src/transformers/models/fsmt/modeling_fsmt.py +4 -0

No files found.
--- a/src/transformers/models/fsmt/modeling_fsmt.py
+++ b/src/transformers/models/fsmt/modeling_fsmt.py
@@ -372,6 +372,10 @@ def _check_shapes(shape_1, shape2):
 def shift_tokens_right(input_ids, pad_token_id):
    """Shift input ids one token to the right, and wrap the last non pad token (usually <eos>)."""
+    # replace possible -100 values in labels by `pad_token_id`
+    input_ids.masked_fill_(input_ids == -100, pad_token_id)
    prev_output_tokens = input_ids.clone()
    index_of_eos = (input_ids.ne(pad_token_id).sum(dim=1) - 1).unsqueeze(-1)
    prev_output_tokens[:, 0] = input_ids.gather(1, index_of_eos).squeeze()