Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
a60ae1a5
"docs/source/ko/tasks/idefics.md" did not exist on "c7f01beece5b25f05c910b130da654283805543d"
Commit
a60ae1a5
authored
Jul 08, 2019
by
LysandreJik
Browse files
Docstrings best practice shown in the BERT documentation.
parent
64fd9863
Changes
3
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
361 additions
and
311 deletions
+361
-311
pytorch_pretrained_bert/modeling_bert.py
pytorch_pretrained_bert/modeling_bert.py
+346
-295
pytorch_pretrained_bert/optimization.py
pytorch_pretrained_bert/optimization.py
+2
-1
pytorch_pretrained_bert/tokenization_bert.py
pytorch_pretrained_bert/tokenization_bert.py
+13
-15
No files found.
pytorch_pretrained_bert/modeling_bert.py
View file @
a60ae1a5
This diff is collapsed.
Click to expand it.
pytorch_pretrained_bert/optimization.py
View file @
a60ae1a5
...
...
@@ -182,7 +182,8 @@ SCHEDULES = {
class
BertAdam
(
Optimizer
):
"""Implements BERT version of Adam algorithm with weight decay fix.
Params:
Parameters:
lr: learning rate
warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1
t_total: total number of training steps for the learning
...
...
pytorch_pretrained_bert/tokenization_bert.py
View file @
a60ae1a5
...
...
@@ -84,24 +84,22 @@ def whitespace_tokenize(text):
class
BertTokenizer
(
object
):
"""Runs end-to-end tokenization: punctuation splitting + wordpiece"""
def
__init__
(
self
,
vocab_file
,
do_lower_case
=
True
,
max_len
=
None
,
do_basic_tokenize
=
True
,
never_split
=
(
"[UNK]"
,
"[SEP]"
,
"[PAD]"
,
"[CLS]"
,
"[MASK]"
)):
"""Constructs a BertTokenizer.
r
"""
Constructs a BertTokenizer.
:class:`~pytorch_pretrained_bert.BertTokenizer` runs end-to-end tokenization: punctuation splitting + wordpiece
Args:
vocab_file: Path to a one-wordpiece-per-line vocabulary file
do_lower_case: Whether to lower case the input
Only has an effect when do_wordpiece_only=False
do_lower_case: Whether to lower case the input. Only has an effect when do_wordpiece_only=False
do_basic_tokenize: Whether to do basic tokenization before wordpiece.
max_len: An artificial maximum length to truncate tokenized sequences to;
Effective maximum length is always the minimum of this
value (if specified) and the underlying BERT model's
sequence length.
never_split: List of tokens which will never be split during tokenization.
Only has an effect when do_wordpiece_only=False
max_len: An artificial maximum length to truncate tokenized sequences to; Effective maximum length is always the
minimum of this value (if specified) and the underlying BERT model's sequence length.
never_split: List of tokens which will never be split during tokenization. Only has an effect when
do_wordpiece_only=False
"""
def
__init__
(
self
,
vocab_file
,
do_lower_case
=
True
,
max_len
=
None
,
do_basic_tokenize
=
True
,
never_split
=
(
"[UNK]"
,
"[SEP]"
,
"[PAD]"
,
"[CLS]"
,
"[MASK]"
)):
if
not
os
.
path
.
isfile
(
vocab_file
):
raise
ValueError
(
"Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment