Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
886cb497
Commit
886cb497
authored
Nov 16, 2018
by
thomwolf
Browse files
updating readme and notebooks
parent
fd647e8c
Changes
6
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
5367 additions
and
330 deletions
+5367
-330
README.md
README.md
+277
-56
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
+272
-272
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
+4815
-0
notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
+0
-0
notebooks/Comparing-TF-and-PT-models.ipynb
notebooks/Comparing-TF-and-PT-models.ipynb
+0
-0
pytorch_pretrained_bert/optimization.py
pytorch_pretrained_bert/optimization.py
+3
-2
No files found.
README.md
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
0 → 100644
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing
TF
and
PT
models
SQuAD
predictions
.ipynb
→
notebooks/Comparing
-
TF
-
and
-
PT
-
models
-
SQuAD.ipynb
View file @
886cb497
File moved
notebooks/Comparing
TF
and
PT
models.ipynb
→
notebooks/Comparing
-
TF
-
and
-
PT
-
models.ipynb
View file @
886cb497
File moved
pytorch_pretrained_bert/optimization.py
View file @
886cb497
...
@@ -42,7 +42,7 @@ SCHEDULES = {
...
@@ -42,7 +42,7 @@ SCHEDULES = {
class
BERTAdam
(
Optimizer
):
class
BERTAdam
(
Optimizer
):
"""Implements BERT version of Adam algorithm with weight decay fix
(and no )
.
"""Implements BERT version of Adam algorithm with weight decay fix.
Params:
Params:
lr: learning rate
lr: learning rate
warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1
warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1
...
@@ -136,7 +136,7 @@ class BERTAdam(Optimizer):
...
@@ -136,7 +136,7 @@ class BERTAdam(Optimizer):
# the correct way of using L2 regularization/weight decay with Adam,
# the correct way of using L2 regularization/weight decay with Adam,
# since that will interact with the m and v parameters in strange ways.
# since that will interact with the m and v parameters in strange ways.
#
#
# Instead we want
o
t decay the weights in a manner that doesn't interact
# Instead we want t
o
decay the weights in a manner that doesn't interact
# with the m/v parameters. This is equivalent to adding the square
# with the m/v parameters. This is equivalent to adding the square
# of the weights to the loss with plain (non-momentum) SGD.
# of the weights to the loss with plain (non-momentum) SGD.
if
group
[
'weight_decay_rate'
]
>
0.0
:
if
group
[
'weight_decay_rate'
]
>
0.0
:
...
@@ -154,6 +154,7 @@ class BERTAdam(Optimizer):
...
@@ -154,6 +154,7 @@ class BERTAdam(Optimizer):
state
[
'step'
]
+=
1
state
[
'step'
]
+=
1
# step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
# step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
# No bias correction
# bias_correction1 = 1 - beta1 ** state['step']
# bias_correction1 = 1 - beta1 ** state['step']
# bias_correction2 = 1 - beta2 ** state['step']
# bias_correction2 = 1 - beta2 ** state['step']
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment