"docs/source/de/preprocessing.md" did not exist on "b5e2b183af5e40e33a4dc7659e697d137259d56e"
Unverified Commit e68c3756 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Allow training to resume even if RNG states are not properly loaded (#14994)

* Allow training to resume even if RNG states are not properly loaded

* Proper f-string
parent 08cb5718
...@@ -1553,7 +1553,13 @@ class Trainer: ...@@ -1553,7 +1553,13 @@ class Trainer:
if self.args.local_rank != -1: if self.args.local_rank != -1:
torch.cuda.random.set_rng_state(checkpoint_rng_state["cuda"]) torch.cuda.random.set_rng_state(checkpoint_rng_state["cuda"])
else: else:
try:
torch.cuda.random.set_rng_state_all(checkpoint_rng_state["cuda"]) torch.cuda.random.set_rng_state_all(checkpoint_rng_state["cuda"])
except Exception as e:
logger.infor(
f"Didn't manage to set back the RNG states of the GPU because of the following error:\n {e}"
"\nThis won't yield the same results as if the training had not been interrupted."
)
if is_torch_tpu_available(): if is_torch_tpu_available():
xm.set_rng_state(checkpoint_rng_state["xla"]) xm.set_rng_state(checkpoint_rng_state["xla"])
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment