[trainer] document resume randomness (#11588)

* document resume randomness * fix link * reword * fix * reword * style

[trainer] document resume randomness (#11588)
* document resume randomness * fix link * reword * fix * reword * style
c065025c · Stas Bekman · GitHub · 6b241e0e · c065025c
Unverified Commit c065025c authored May 04, 2021 by Stas Bekman Committed by GitHub May 04, 2021
Show whitespace changes
Inline Side-by-side

Showing with 14 additions and 0 deletions

docs/source/main_classes/trainer.rst docs/source/main_classes/trainer.rst +14 -0

No files found.
--- a/docs/source/main_classes/trainer.rst
+++ b/docs/source/main_classes/trainer.rst
@@ -119,6 +119,20 @@ TFTrainingArguments
    :members:
+Randomness
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the
+`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint,
+which should make the "stop and resume" style of training as close as possible to non-stop training.
+However, due to various default non-deterministic pytorch settings this might not fully work. If you want full
+determinism please refer to `Controlling sources of randomness
+<https://pytorch.org/docs/stable/notes/randomness.html>`__. As explained in the document, that some of those settings
+that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this
+can't be done by default, but you can enable those yourself if needed.
 Trainer Integrations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~