Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
886cb497
Commit
886cb497
authored
Nov 16, 2018
by
thomwolf
Browse files
updating readme and notebooks
parent
fd647e8c
Changes
6
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
5367 additions
and
330 deletions
+5367
-330
README.md
README.md
+277
-56
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
+272
-272
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
+4815
-0
notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
+0
-0
notebooks/Comparing-TF-and-PT-models.ipynb
notebooks/Comparing-TF-and-PT-models.ipynb
+0
-0
pytorch_pretrained_bert/optimization.py
pytorch_pretrained_bert/optimization.py
+3
-2
No files found.
README.md
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing TF and PT models_MLM_NSP.ipynb
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
0 → 100644
View file @
886cb497
This diff is collapsed.
Click to expand it.
notebooks/Comparing
TF
and
PT
models
SQuAD
predictions
.ipynb
→
notebooks/Comparing
-
TF
-
and
-
PT
-
models
-
SQuAD.ipynb
View file @
886cb497
File moved
notebooks/Comparing
TF
and
PT
models.ipynb
→
notebooks/Comparing
-
TF
-
and
-
PT
-
models.ipynb
View file @
886cb497
File moved
pytorch_pretrained_bert/optimization.py
View file @
886cb497
...
...
@@ -42,7 +42,7 @@ SCHEDULES = {
class
BERTAdam
(
Optimizer
):
"""Implements BERT version of Adam algorithm with weight decay fix
(and no )
.
"""Implements BERT version of Adam algorithm with weight decay fix.
Params:
lr: learning rate
warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1
...
...
@@ -136,7 +136,7 @@ class BERTAdam(Optimizer):
# the correct way of using L2 regularization/weight decay with Adam,
# since that will interact with the m and v parameters in strange ways.
#
# Instead we want
o
t decay the weights in a manner that doesn't interact
# Instead we want t
o
decay the weights in a manner that doesn't interact
# with the m/v parameters. This is equivalent to adding the square
# of the weights to the loss with plain (non-momentum) SGD.
if
group
[
'weight_decay_rate'
]
>
0.0
:
...
...
@@ -154,6 +154,7 @@ class BERTAdam(Optimizer):
state
[
'step'
]
+=
1
# step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
# No bias correction
# bias_correction1 = 1 - beta1 ** state['step']
# bias_correction2 = 1 - beta2 ** state['step']
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment