"docs/source/en/task_summary.md" did not exist on "4be75e97285149a9d8e83d8bca86cb57c6425a56"
optimizer_schedules.rst 3.33 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Optimization
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
thomwolf's avatar
thomwolf committed
15

16
17
18
19
The ``.optimization`` module provides:

- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
20
- a gradient accumulation class to accumulate the gradients of multiple batches
21

Sylvain Gugger's avatar
Sylvain Gugger committed
22
23
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
thomwolf's avatar
thomwolf committed
24

25
.. autoclass:: transformers.AdamW
thomwolf's avatar
thomwolf committed
26
27
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
30
31
32

.. autoclass:: transformers.Adafactor

Sylvain Gugger's avatar
Sylvain Gugger committed
33
34
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35
36
37
38
39

.. autoclass:: transformers.AdamWeightDecay

.. autofunction:: transformers.create_optimizer

thomwolf's avatar
thomwolf committed
40
Schedules
Sylvain Gugger's avatar
Sylvain Gugger committed
41
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42
43

Learning Rate Schedules (Pytorch)
Sylvain Gugger's avatar
Sylvain Gugger committed
44
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
thomwolf's avatar
thomwolf committed
45

Sylvain Gugger's avatar
Sylvain Gugger committed
46
47
48
49
.. autoclass:: transformers.SchedulerType

.. autofunction:: transformers.get_scheduler

50
.. autofunction:: transformers.get_constant_schedule
thomwolf's avatar
thomwolf committed
51

52

53
.. autofunction:: transformers.get_constant_schedule_with_warmup
thomwolf's avatar
thomwolf committed
54

55
56
57
58
59
.. image:: /imgs/warmup_constant_schedule.png
    :target: /imgs/warmup_constant_schedule.png
    :alt:


60
.. autofunction:: transformers.get_cosine_schedule_with_warmup
thomwolf's avatar
thomwolf committed
61

62
63
64
65
66
.. image:: /imgs/warmup_cosine_schedule.png
    :target: /imgs/warmup_cosine_schedule.png
    :alt:


67
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
thomwolf's avatar
thomwolf committed
68

69
70
71
72
73
74
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
    :target: /imgs/warmup_cosine_hard_restarts_schedule.png
    :alt:



75
.. autofunction:: transformers.get_linear_schedule_with_warmup
76
77
78
79

.. image:: /imgs/warmup_linear_schedule.png
    :target: /imgs/warmup_linear_schedule.png
    :alt:
80

81
82
83
84

.. autofunction:: transformers.get_polynomial_decay_schedule_with_warmup


Sylvain Gugger's avatar
Sylvain Gugger committed
85
86
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
87

88
.. autoclass:: transformers.WarmUp
89
90
91
    :members:

Gradient Strategies
Sylvain Gugger's avatar
Sylvain Gugger committed
92
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93

Sylvain Gugger's avatar
Sylvain Gugger committed
94
95
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
96
97

.. autoclass:: transformers.GradientAccumulator