Fix Seq2SeqTrainer (#15603)

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>

Fix Seq2SeqTrainer (#15603)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
3a2ed967 · NielsRogge · GitHub · 724e51c6 · 3a2ed967
Unverified Commit 3a2ed967 authored Feb 10, 2022 by NielsRogge Committed by GitHub Feb 10, 2022
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 1 deletion

src/transformers/trainer_seq2seq.py src/transformers/trainer_seq2seq.py +3 -1

No files found.
--- a/src/transformers/trainer_seq2seq.py
+++ b/src/transformers/trainer_seq2seq.py
@@ -161,6 +161,9 @@ class Seq2SeqTrainer(Trainer):
            "synced_gpus": True if is_deepspeed_zero3_enabled() else False,
        }
+        if "attention_mask" in inputs:
+            gen_kwargs["attention_mask"] = inputs.get("attention_mask", None)
        # prepare generation inputs
        # some encoder-decoder models can have varying encder's and thus
        # varying model input names
@@ -171,7 +174,6 @@ class Seq2SeqTrainer(Trainer):
        generated_tokens = self.model.generate(
            generation_inputs,
-            attention_mask=inputs.get("attention_mask", None),
            **gen_kwargs,
        )
        # in case the batch is shorter than max length, the output should be padded