Merge branch 'github-pr' into 'main'

Pull in some GitHub PRs See merge request ADLR/megatron-lm!282

Merge branch 'github-pr' into 'main'
Pull in some GitHub PRs See merge request ADLR/megatron-lm!282
90e0a0dd · Jared Casper · 82b69e86 · 4a35d50a · 90e0a0dd · 90e0a0dd
Commit 90e0a0dd authored Jun 11, 2021 by Jared Casper
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 2 deletions

README.md README.md +7 -1

examples/pretrain_t5_distributed_with_mp.sh examples/pretrain_t5_distributed_with_mp.sh +0 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -103,6 +103,11 @@ python tools/preprocess_data.py \
 The output will be two files named, in this case, `my-bert_text_sentence.bin` and `my-bert_text_sentence.idx`. The `--data-path` specified in later BERT training is the full path and new filename, but without the file extension.
+For T5 use the same preprocessing as BERT, perhaps renaming it to:
+<pre>
+       --output-prefix my-t5 \
+</pre>
 Some minor modifications are required for GPT data preprocessing, namely, the addition of a merge table, an end-of-document token, removal of sentence splitting, and a change to the tokenizer type:
 <pre>
 python tools/preprocess_data.py \
@@ -237,13 +242,14 @@ T5_ARGS="--num-layers 24 \
         --micro-batch-size 16 \
         --global-batch-size 2048 \
         --vocab-file $VOCAB_FILE \
+         --vocab-extra-ids 100 \
         --split 949,50,1 \
         --fp16"
 OUTPUT_ARGS=&#60;same as those in <a href="#bert-pretraining">BERT pretraining</a> above&#62;
 python pretrain_t5.py \
-       $BERT_ARGS \
+       $T5_ARGS \
       $OUTPUT_ARGS \
       --save $CHECKPOINT_PATH \
       --load $CHECKPOINT_PATH \

--- a/examples/pretrain_t5_distributed_with_mp.sh
+++ b/examples/pretrain_t5_distributed_with_mp.sh
@@ -25,7 +25,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
       --decoder-seq-length 128 \
       --micro-batch-size 16 \
       --global-batch-size 128 \
-       --seq-length 512 \
       --max-position-embeddings 512 \
       --train-iters 1000000 \
       --lr-decay-iters 1000000 \