TF train_step docstring (#15755)

* TF train_step docstring

TF train_step docstring (#15755)
* TF train_step docstring
2c3fcc64 · Joao Gante · GitHub · 38bed912 · 2c3fcc64
Unverified Commit 2c3fcc64 authored Feb 22, 2022 by Joao Gante Committed by GitHub Feb 22, 2022
Show whitespace changes
Inline Side-by-side

Showing with 11 additions and 1 deletion

src/transformers/modeling_tf_utils.py src/transformers/modeling_tf_utils.py +11 -1

No files found.
--- a/src/transformers/modeling_tf_utils.py
+++ b/src/transformers/modeling_tf_utils.py
@@ -884,7 +884,17 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin, Pu
    def train_step(self, data):
        """
-        A modification of Keras's default `train_step` that cleans up the printed metrics when we use a dummy loss.
+        A modification of Keras's default `train_step` that cleans up the printed metrics when we use a dummy loss. If
+        a user specifies a loss at model compile time, this function behaves as the original Keras `train_step`. In
+        this case, it expects the same `data` as the original function (i.e. `(inputs, labels)`).
+        However, when the model is compiled without specifying the loss AND the expected label columns are passed as
+        part of the input dictionary, the loss is computed internally (inside the model class) and is used in the
+        backwards pass. In this case, `data` is a singleton tuple containing `(inputs,)`.
+        This is possible under the aforementioned circumstances because our overriden compile function can set an
+        additional loss function that reduces a `loss` output, and the model will output a `loss` component (notice the
+        name matching) containing the loss that was used to train the pre-trained model.
        """
        # These are the only transformations `Model.fit` applies to user-input
        # data when a `tf.data.Dataset` is provided.