Unverified Commit 08de989a authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Trainer with grad accum (#6930)

* Add warning for gradient accumulation

* Formatting
parent d4aa7284
...@@ -60,6 +60,12 @@ class TrainingArguments: ...@@ -60,6 +60,12 @@ class TrainingArguments:
The batch size per GPU/TPU core/CPU for evaluation. The batch size per GPU/TPU core/CPU for evaluation.
gradient_accumulation_steps: (:obj:`int`, `optional`, defaults to 1): gradient_accumulation_steps: (:obj:`int`, `optional`, defaults to 1):
Number of updates steps to accumulate the gradients for, before performing a backward/update pass. Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
.. warning::
When using gradient accumulation, one step is counted as one step with backward pass. Therefore,
logging, evaluation, save will be conducted every ``gradient_accumulation_steps * xxx_step`` training
examples.
learning_rate (:obj:`float`, `optional`, defaults to 5e-5): learning_rate (:obj:`float`, `optional`, defaults to 5e-5):
The initial learning rate for Adam. The initial learning rate for Adam.
weight_decay (:obj:`float`, `optional`, defaults to 0): weight_decay (:obj:`float`, `optional`, defaults to 0):
......
...@@ -42,6 +42,12 @@ class TFTrainingArguments(TrainingArguments): ...@@ -42,6 +42,12 @@ class TFTrainingArguments(TrainingArguments):
The batch size per GPU/TPU core/CPU for evaluation. The batch size per GPU/TPU core/CPU for evaluation.
gradient_accumulation_steps: (:obj:`int`, `optional`, defaults to 1): gradient_accumulation_steps: (:obj:`int`, `optional`, defaults to 1):
Number of updates steps to accumulate the gradients for, before performing a backward/update pass. Number of updates steps to accumulate the gradients for, before performing a backward/update pass.
.. warning::
When using gradient accumulation, one step is counted as one step with backward pass. Therefore,
logging, evaluation, save will be conducted every ``gradient_accumulation_steps * xxx_step`` training
examples.
learning_rate (:obj:`float`, `optional`, defaults to 5e-5): learning_rate (:obj:`float`, `optional`, defaults to 5e-5):
The initial learning rate for Adam. The initial learning rate for Adam.
weight_decay (:obj:`float`, `optional`, defaults to 0): weight_decay (:obj:`float`, `optional`, defaults to 0):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment