"test/srt/git@developer.sourcefind.cn:zhaoyu6/sglang.git" did not exist on "a5114b6f910f3c2a45b628a4052d47c9b518ccea"
optimizer_schedules.rst 2.56 KB
Newer Older
1
Optimization
Sylvain Gugger's avatar
Sylvain Gugger committed
2
-----------------------------------------------------------------------------------------------------------------------
thomwolf's avatar
thomwolf committed
3

4
5
6
7
The ``.optimization`` module provides:

- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
8
- a gradient accumulation class to accumulate the gradients of multiple batches
9

Sylvain Gugger's avatar
Sylvain Gugger committed
10
11
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
thomwolf's avatar
thomwolf committed
12

13
.. autoclass:: transformers.AdamW
thomwolf's avatar
thomwolf committed
14
15
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
16
17
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre Debut's avatar
Lysandre Debut committed
18
19
20

.. autoclass:: transformers.Adafactor

Sylvain Gugger's avatar
Sylvain Gugger committed
21
22
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23
24
25
26
27

.. autoclass:: transformers.AdamWeightDecay

.. autofunction:: transformers.create_optimizer

thomwolf's avatar
thomwolf committed
28
Schedules
Sylvain Gugger's avatar
Sylvain Gugger committed
29
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30
31

Learning Rate Schedules (Pytorch)
Sylvain Gugger's avatar
Sylvain Gugger committed
32
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
thomwolf's avatar
thomwolf committed
33

34
.. autofunction:: transformers.get_constant_schedule
thomwolf's avatar
thomwolf committed
35

36

37
.. autofunction:: transformers.get_constant_schedule_with_warmup
thomwolf's avatar
thomwolf committed
38

39
40
41
42
43
.. image:: /imgs/warmup_constant_schedule.png
    :target: /imgs/warmup_constant_schedule.png
    :alt:


44
.. autofunction:: transformers.get_cosine_schedule_with_warmup
thomwolf's avatar
thomwolf committed
45

46
47
48
49
50
.. image:: /imgs/warmup_cosine_schedule.png
    :target: /imgs/warmup_cosine_schedule.png
    :alt:


51
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
thomwolf's avatar
thomwolf committed
52

53
54
55
56
57
58
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
    :target: /imgs/warmup_cosine_hard_restarts_schedule.png
    :alt:



59
.. autofunction:: transformers.get_linear_schedule_with_warmup
60
61
62
63

.. image:: /imgs/warmup_linear_schedule.png
    :target: /imgs/warmup_linear_schedule.png
    :alt:
64

Sylvain Gugger's avatar
Sylvain Gugger committed
65
66
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
67

68
.. autoclass:: transformers.WarmUp
69
70
71
    :members:

Gradient Strategies
Sylvain Gugger's avatar
Sylvain Gugger committed
72
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73

Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
76
77

.. autoclass:: transformers.GradientAccumulator