Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
29a392fb
"doc/git@developer.sourcefind.cn:wangsen/paddle_dbnet.git" did not exist on "95c931e9d59b84b351b0ef84739fd8696606ca20"
Commit
29a392fb
authored
Mar 20, 2019
by
Matthew Carrigan
Browse files
Small README changes
parent
832b2b00
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
examples/lm_finetuning/README.md
examples/lm_finetuning/README.md
+4
-3
No files found.
examples/lm_finetuning/README.md
View file @
29a392fb
...
@@ -51,9 +51,10 @@ by `pregenerate_training_data.py`. Note that you should use the same bert_model
...
@@ -51,9 +51,10 @@ by `pregenerate_training_data.py`. Note that you should use the same bert_model
Also note that max_seq_len does not need to be specified for the
`finetune_on_pregenerated.py`
script,
Also note that max_seq_len does not need to be specified for the
`finetune_on_pregenerated.py`
script,
as it is inferred from the training examples.
as it is inferred from the training examples.
There are various options that can be tweaked, but the most important ones are probably
`max_seq_len`
, which controls
There are various options that can be tweaked, but they are mostly set to the values from the BERT paper/repo and should
the length of training examples (in wordpiece tokens) seen by the model, and
`--fp16`
, which enables fast half-precision
be left alone. The most relevant ones for the end-user are probably
`--max_seq_len`
, which controls the length of
training on recent GPUs.
`max_seq_len`
defaults to 128 but can be set as high as 512.
training examples (in wordpiece tokens) seen by the model, and
`--fp16`
, which enables fast half-precision training on
recent GPUs.
`--max_seq_len`
defaults to 128 but can be set as high as 512.
Higher values may yield stronger language models at the cost of slower and more memory-intensive training
Higher values may yield stronger language models at the cost of slower and more memory-intensive training
In addition, if memory usage is an issue, especially when training on a single GPU, reducing
`--train_batch_size`
from
In addition, if memory usage is an issue, especially when training on a single GPU, reducing
`--train_batch_size`
from
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment