Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
629b22ad
Commit
629b22ad
authored
Jan 01, 2020
by
Julien Chaumond
Browse files
[run_lm_finetuning] mask_tokens: document types
parent
594ca6de
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
examples/run_lm_finetuning.py
examples/run_lm_finetuning.py
+3
-1
No files found.
examples/run_lm_finetuning.py
View file @
629b22ad
...
...
@@ -28,6 +28,7 @@ import pickle
import
random
import
re
import
shutil
from
typing
import
Tuple
import
numpy
as
np
import
torch
...
...
@@ -53,6 +54,7 @@ from transformers import (
OpenAIGPTConfig
,
OpenAIGPTLMHeadModel
,
OpenAIGPTTokenizer
,
PreTrainedTokenizer
,
RobertaConfig
,
RobertaForMaskedLM
,
RobertaTokenizer
,
...
...
@@ -164,7 +166,7 @@ def _rotate_checkpoints(args, checkpoint_prefix, use_mtime=False):
shutil
.
rmtree
(
checkpoint
)
def
mask_tokens
(
inputs
,
tokenizer
,
args
)
:
def
mask_tokens
(
inputs
:
torch
.
Tensor
,
tokenizer
:
PreTrainedTokenizer
,
args
)
->
Tuple
[
torch
.
Tensor
,
torch
.
Tensor
]
:
""" Prepare masked tokens inputs/labels for masked language modeling: 80% MASK, 10% random, 10% original. """
labels
=
inputs
.
clone
()
# We sample a few tokens in each sequence for masked-LM training (with probability args.mlm_probability defaults to 0.15 in Bert/RoBERTa)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment