Gradient accumulation for TFTrainer (#9585)

* gradient accumulation for tftrainer * label naming Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * label naming Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Gradient accumulation for TFTrainer (#9585)
* gradient accumulation for tftrainer * label naming Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * label naming Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
3f40070c · Kiyoung Kim · GitHub · e43f3b61 · 3f40070c
Unverified Commit 3f40070c authored Jan 15, 2021 by Kiyoung Kim Committed by GitHub Jan 14, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 4 deletions

src/transformers/trainer_tf.py src/transformers/trainer_tf.py +10 -4

No files found.
--- a/src/transformers/trainer_tf.py
+++ b/src/transformers/trainer_tf.py
@@ -638,7 +638,9 @@ class TFTrainer:
                reduced_features = {
                    k: ft[: self.args.train_batch_size // self.args.n_replicas] for k, ft in features.items()
                }
-                reduced_labels = labels[: self.args.train_batch_size // self.args.n_replicas]
+                reduced_labels = {
+                    k: lbl[: self.args.train_batch_size // self.args.n_replicas] for k, lbl in labels.items()
+                }

                self.training_step(reduced_features, reduced_labels, nb_instances_in_global_batch)

@@ -650,9 +652,13 @@ class TFTrainer:
                    for k, ft in features.items()
                }

-                labels = tf.concat(
-                    [labels[self.args.train_batch_size // self.args.n_replicas :], reduced_labels], axis=0
-                )
+                labels = {
+                    k: tf.concat(
+                        [lbl[self.args.train_batch_size // self.args.n_replicas :], reduced_labels[k]],
+                        axis=0,
+                    )
+                    for k, lbl in labels.items()
+                }

            gradients = self.gradient_accumulator.gradients
            gradients = [