optimizer_schedules.rst 3.17 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Optimization
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
thomwolf's avatar
thomwolf committed
15

16
17
18
19
The ``.optimization`` module provides:

- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
20
- a gradient accumulation class to accumulate the gradients of multiple batches
21

Sylvain Gugger's avatar
Sylvain Gugger committed
22
23
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
thomwolf's avatar
thomwolf committed
24

25
.. autoclass:: transformers.AdamW
thomwolf's avatar
thomwolf committed
26
27
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
30
31
32

.. autoclass:: transformers.Adafactor

Sylvain Gugger's avatar
Sylvain Gugger committed
33
34
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
35
36
37
38
39

.. autoclass:: transformers.AdamWeightDecay

.. autofunction:: transformers.create_optimizer

thomwolf's avatar
thomwolf committed
40
Schedules
Sylvain Gugger's avatar
Sylvain Gugger committed
41
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42
43

Learning Rate Schedules (Pytorch)
Sylvain Gugger's avatar
Sylvain Gugger committed
44
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
thomwolf's avatar
thomwolf committed
45

46
.. autofunction:: transformers.get_constant_schedule
thomwolf's avatar
thomwolf committed
47

48

49
.. autofunction:: transformers.get_constant_schedule_with_warmup
thomwolf's avatar
thomwolf committed
50

51
52
53
54
55
.. image:: /imgs/warmup_constant_schedule.png
    :target: /imgs/warmup_constant_schedule.png
    :alt:


56
.. autofunction:: transformers.get_cosine_schedule_with_warmup
thomwolf's avatar
thomwolf committed
57

58
59
60
61
62
.. image:: /imgs/warmup_cosine_schedule.png
    :target: /imgs/warmup_cosine_schedule.png
    :alt:


63
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
thomwolf's avatar
thomwolf committed
64

65
66
67
68
69
70
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
    :target: /imgs/warmup_cosine_hard_restarts_schedule.png
    :alt:



71
.. autofunction:: transformers.get_linear_schedule_with_warmup
72
73
74
75

.. image:: /imgs/warmup_linear_schedule.png
    :target: /imgs/warmup_linear_schedule.png
    :alt:
76

Sylvain Gugger's avatar
Sylvain Gugger committed
77
78
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
79

80
.. autoclass:: transformers.WarmUp
81
82
83
    :members:

Gradient Strategies
Sylvain Gugger's avatar
Sylvain Gugger committed
84
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85

Sylvain Gugger's avatar
Sylvain Gugger committed
86
87
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
88
89

.. autoclass:: transformers.GradientAccumulator