Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
3573423f
Commit
3573423f
authored
May 13, 2019
by
Raul Puri
Browse files
added presplit-sentences to scripts
parent
d0878333
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
3 additions
and
0 deletions
+3
-0
scripts/pretrain_bert.sh
scripts/pretrain_bert.sh
+1
-0
scripts/pretrain_bert_distributed.sh
scripts/pretrain_bert_distributed.sh
+1
-0
scripts/pretrain_bert_sentencepiece.sh
scripts/pretrain_bert_sentencepiece.sh
+1
-0
No files found.
scripts/pretrain_bert.sh
View file @
3573423f
...
@@ -10,6 +10,7 @@ python pretrain_bert.py \
...
@@ -10,6 +10,7 @@ python pretrain_bert.py \
--tokenizer-model-type
bert-large-uncased
\
--tokenizer-model-type
bert-large-uncased
\
--vocab-size
30522
\
--vocab-size
30522
\
--train-data
wikipedia
\
--train-data
wikipedia
\
--presplit-sentences
\
--loose-json
\
--loose-json
\
--text-key
text
\
--text-key
text
\
--split
1000,1,1
\
--split
1000,1,1
\
...
...
scripts/pretrain_bert_distributed.sh
View file @
3573423f
...
@@ -17,6 +17,7 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
...
@@ -17,6 +17,7 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--tokenizer-model-type
bert-large-uncased
\
--tokenizer-model-type
bert-large-uncased
\
--vocab-size
30522
\
--vocab-size
30522
\
--train-data
wikipedia
\
--train-data
wikipedia
\
--presplit-sentences
\
--loose-json
\
--loose-json
\
--text-key
text
\
--text-key
text
\
--split
1000,1,1
\
--split
1000,1,1
\
...
...
scripts/pretrain_bert_sentencepiece.sh
View file @
3573423f
...
@@ -10,6 +10,7 @@ python pretrain_bert.py \
...
@@ -10,6 +10,7 @@ python pretrain_bert.py \
--tokenizer-path
tokenizer.model
\
--tokenizer-path
tokenizer.model
\
--vocab-size
30522
\
--vocab-size
30522
\
--train-data
wikipedia
\
--train-data
wikipedia
\
--presplit-sentences
\
--loose-json
\
--loose-json
\
--text-key
text
\
--text-key
text
\
--split
1000,1,1
\
--split
1000,1,1
\
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment