paddlenlp.ops.optimizer.po 5.57 KB
Newer Older
yuguo-Jack's avatar
yuguo-Jack committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2021, PaddleNLP
# This file is distributed under the same license as the PaddleNLP package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2022.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PaddleNLP \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2022-03-18 21:31+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.9.0\n"

#: ../source/paddlenlp.ops.optimizer.rst:2
msgid "optimizer"
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:1
msgid "基类::class:`paddle.optimizer.adamw.AdamW`"
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:1
msgid ""
"The AdamWDL optimizer is implemented based on the AdamW Optimization with"
" dynamic lr setting. Generally it's used for transformer model."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:4
msgid ""
"We use \"layerwise_lr_decay\" as default dynamic lr setting method of "
"AdamWDL. “Layer-wise decay” means exponentially decaying the learning "
"rates of individual layers in a top-down manner. For example, suppose the"
" 24-th layer uses a learning rate l, and the Layer-wise decay rate is α, "
"then the learning rate of layer m is lα^(24-m). See more details on: "
"https://arxiv.org/abs/1906.08237."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:10
msgid ""
"& t = t + 1\n"
"\n"
"& moment\\_1\\_out = {\\beta}_1 * moment\\_1 + (1 - {\\beta}_1) * grad\n"
"\n"
"& moment\\_2\\_out = {\\beta}_2 * moment\\_2 + (1 - {\\beta}_2) * grad * "
"grad\n"
"\n"
"& learning\\_rate = learning\\_rate * \\frac{\\sqrt{1 - {\\beta}_2^t}}{1 "
"- {\\beta}_1^t}\n"
"\n"
"& param\\_out = param - learning\\_rate * "
"(\\frac{moment\\_1}{\\sqrt{moment\\_2} + \\epsilon} + \\lambda * param)"
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL
msgid "参数"
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:21
msgid ""
"The learning rate used to update ``Parameter``. It can be a float value "
"or a LRScheduler. The default value is 0.001."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:24
msgid ""
"The exponential decay rate for the 1st moment estimates. It should be a "
"float number or a Tensor with shape [1] and data type as float32. The "
"default value is 0.9."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:28
msgid ""
"The exponential decay rate for the 2nd moment estimates. It should be a "
"float number or a Tensor with shape [1] and data type as float32. The "
"default value is 0.999."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:32
msgid ""
"A small float value for numerical stability. It should be a float number "
"or a Tensor with shape [1] and data type as float32. The default value is"
" 1e-08."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:36
msgid ""
"List/Tuple of ``Tensor`` to update to minimize ``loss``. \\ This "
"parameter is required in dygraph mode. \\ The default value is None in "
"static mode, at this time all parameters will be updated."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:40
msgid ""
"The weight decay coefficient, it can be float or Tensor. The default "
"value is 0.01."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:42
msgid ""
"If it is not None, only tensors that makes "
"apply_decay_param_fun(Tensor.name)==True will be updated. It only works "
"when we want to specify tensors. Default: None."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:47
msgid ""
"Gradient cliping strategy, it's an instance of some derived class of "
"``GradientClipBase`` . There are three cliping strategies ( "
":ref:`api_fluid_clip_GradientClipByGlobalNorm` , "
":ref:`api_fluid_clip_GradientClipByNorm` , "
":ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there "
"is no gradient clipping."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:52
msgid ""
"The official Adam algorithm has two moving-average accumulators. The "
"accumulators are updated at every step. Every element of the two moving-"
"average is updated in both dense mode and sparse mode. If the size of "
"parameter is very large, then the update may be very slow. The lazy mode "
"only update the element that has gradient in current mini-batch, so it "
"will be much more faster. But this mode has different semantics with the "
"original Adam algorithm and may lead to different result. The default "
"value is False."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:60
msgid "Whether to use multi-precision during weight updating. Default is false."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:62
msgid "The layer-wise decay ratio. Defaults to 1.0."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:64
msgid "The total number of encoder layers. Defaults to 12."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:66
msgid ""
"If it's not None, set_param_lr_fun() will set the parameter learning "
"rate before it executes Adam Operator. Defaults to "
":ref:`layerwise_lr_decay`."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:69
msgid ""
"The keys of name_dict is dynamic name of model while the value of "
"name_dict is static name. Use model.named_parameters() to get name_dict."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:72
msgid ""
"Normally there is no need for user to set this property. For more "
"information, please refer to :ref:`api_guide_Name`. The default value is "
"None."
msgstr ""

#: of paddlenlp.ops.optimizer.adamwdl.AdamWDL:78
msgid "实际案例"
msgstr ""