Commit 90e0a0dd authored by Jared Casper's avatar Jared Casper
Browse files

Merge branch 'github-pr' into 'main'

Pull in some GitHub PRs

See merge request ADLR/megatron-lm!282
parents 82b69e86 4a35d50a
...@@ -103,6 +103,11 @@ python tools/preprocess_data.py \ ...@@ -103,6 +103,11 @@ python tools/preprocess_data.py \
The output will be two files named, in this case, `my-bert_text_sentence.bin` and `my-bert_text_sentence.idx`. The `--data-path` specified in later BERT training is the full path and new filename, but without the file extension. The output will be two files named, in this case, `my-bert_text_sentence.bin` and `my-bert_text_sentence.idx`. The `--data-path` specified in later BERT training is the full path and new filename, but without the file extension.
For T5 use the same preprocessing as BERT, perhaps renaming it to:
<pre>
--output-prefix my-t5 \
</pre>
Some minor modifications are required for GPT data preprocessing, namely, the addition of a merge table, an end-of-document token, removal of sentence splitting, and a change to the tokenizer type: Some minor modifications are required for GPT data preprocessing, namely, the addition of a merge table, an end-of-document token, removal of sentence splitting, and a change to the tokenizer type:
<pre> <pre>
python tools/preprocess_data.py \ python tools/preprocess_data.py \
...@@ -237,13 +242,14 @@ T5_ARGS="--num-layers 24 \ ...@@ -237,13 +242,14 @@ T5_ARGS="--num-layers 24 \
--micro-batch-size 16 \ --micro-batch-size 16 \
--global-batch-size 2048 \ --global-batch-size 2048 \
--vocab-file $VOCAB_FILE \ --vocab-file $VOCAB_FILE \
--vocab-extra-ids 100 \
--split 949,50,1 \ --split 949,50,1 \
--fp16" --fp16"
OUTPUT_ARGS=&#60;same as those in <a href="#bert-pretraining">BERT pretraining</a> above&#62; OUTPUT_ARGS=&#60;same as those in <a href="#bert-pretraining">BERT pretraining</a> above&#62;
python pretrain_t5.py \ python pretrain_t5.py \
$BERT_ARGS \ $T5_ARGS \
$OUTPUT_ARGS \ $OUTPUT_ARGS \
--save $CHECKPOINT_PATH \ --save $CHECKPOINT_PATH \
--load $CHECKPOINT_PATH \ --load $CHECKPOINT_PATH \
......
...@@ -25,7 +25,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \ ...@@ -25,7 +25,6 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS \
--decoder-seq-length 128 \ --decoder-seq-length 128 \
--micro-batch-size 16 \ --micro-batch-size 16 \
--global-batch-size 128 \ --global-batch-size 128 \
--seq-length 512 \
--max-position-embeddings 512 \ --max-position-embeddings 512 \
--train-iters 1000000 \ --train-iters 1000000 \
--lr-decay-iters 1000000 \ --lr-decay-iters 1000000 \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment