Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
wangsen
megatron-LM-llama
Commits
523ec9cc
Commit
523ec9cc
authored
Sep 03, 2024
by
wangsen
Browse files
all
parents
Pipeline
#1668
failed with stages
in 0 seconds
Changes
757
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1028 additions
and
0 deletions
+1028
-0
examples/academic_paper_scripts/msdp/data_processing.sh
examples/academic_paper_scripts/msdp/data_processing.sh
+83
-0
examples/academic_paper_scripts/msdp/eval_knwl_generation.sh
examples/academic_paper_scripts/msdp/eval_knwl_generation.sh
+43
-0
examples/academic_paper_scripts/msdp/eval_resp_generation.sh
examples/academic_paper_scripts/msdp/eval_resp_generation.sh
+64
-0
examples/academic_paper_scripts/msdp/prep_resp_gen.sh
examples/academic_paper_scripts/msdp/prep_resp_gen.sh
+18
-0
examples/academic_paper_scripts/msdp/prompt_knwl_gen.sh
examples/academic_paper_scripts/msdp/prompt_knwl_gen.sh
+46
-0
examples/academic_paper_scripts/msdp/prompt_resp_gen.sh
examples/academic_paper_scripts/msdp/prompt_resp_gen.sh
+46
-0
examples/academic_paper_scripts/sc21/CONFIG.sh
examples/academic_paper_scripts/sc21/CONFIG.sh
+57
-0
examples/academic_paper_scripts/sc21/README.md
examples/academic_paper_scripts/sc21/README.md
+50
-0
examples/academic_paper_scripts/sc21/SBATCH.sh
examples/academic_paper_scripts/sc21/SBATCH.sh
+13
-0
examples/academic_paper_scripts/sc21/SRUN.sh
examples/academic_paper_scripts/sc21/SRUN.sh
+18
-0
examples/academic_paper_scripts/sc21/run_figure_11.sh
examples/academic_paper_scripts/sc21/run_figure_11.sh
+46
-0
examples/academic_paper_scripts/sc21/run_figure_12.sh
examples/academic_paper_scripts/sc21/run_figure_12.sh
+54
-0
examples/academic_paper_scripts/sc21/run_figure_13.sh
examples/academic_paper_scripts/sc21/run_figure_13.sh
+46
-0
examples/academic_paper_scripts/sc21/run_figure_14.sh
examples/academic_paper_scripts/sc21/run_figure_14.sh
+47
-0
examples/academic_paper_scripts/sc21/run_figure_15.sh
examples/academic_paper_scripts/sc21/run_figure_15.sh
+47
-0
examples/academic_paper_scripts/sc21/run_figure_16.sh
examples/academic_paper_scripts/sc21/run_figure_16.sh
+43
-0
examples/academic_paper_scripts/sc21/run_figure_17.sh
examples/academic_paper_scripts/sc21/run_figure_17.sh
+54
-0
examples/academic_paper_scripts/sc21/run_figure_18.sh
examples/academic_paper_scripts/sc21/run_figure_18.sh
+54
-0
examples/academic_paper_scripts/sc21/run_table_1.sh
examples/academic_paper_scripts/sc21/run_table_1.sh
+145
-0
examples/bert/README.md
examples/bert/README.md
+54
-0
No files found.
examples/academic_paper_scripts/msdp/data_processing.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
# Data preparation for our framework: preprocessing the WoW and WoI datasets
# The datasets can be downloaded through the following links:
# WoW: https://parl.ai/projects/wizard_of_wikipedia/
# WoI: https://parl.ai/projects/sea/
DIR
=
`
pwd
`
# Before running the preprocessing, please download
# the wizard of wikipedia and wizard datasets
WOW_DATA_FOLDER
=
<PATH_OF_WIZARD_OF_WIKIPEDIA_DATA_FOLDER>
WOI_DATA_FOLDER
=
<PATH_OF_WIZARD_OF_INTERNET_DATA_FOLDER>
# We provide examples for processing the raw data from Wizard of Wikipedia
# Processing the train dataset (train.json)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
process_wow_dataset
\
--raw_file
${
WOW_DATA_FOLDER
}
/train.json
\
--processed_file
${
WOW_DATA_FOLDER
}
/train_processed.txt
# Processing test seen dataset (test_random_split.json)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
process_wow_dataset
\
--raw_file
${
WOW_DATA_FOLDER
}
/test_random_split.json
\
--processed_file
${
WOW_DATA_FOLDER
}
/testseen_processed.txt
\
--knwl_ref_file
${
WOW_DATA_FOLDER
}
/output_testseen_knowledge_reference.txt
\
--resp_ref_file
${
WOW_DATA_FOLDER
}
/output_testseen_response_reference.txt
# processing test unseen dataset (test_topic_split.json)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
process_wow_dataset
\
--raw_file
${
WOW_DATA_FOLDER
}
/test_topic_split.json
\
--processed_file
${
WOW_DATA_FOLDER
}
/testunseen_processed.txt
\
--knwl_ref_file
${
WOW_DATA_FOLDER
}
/output_testunseen_knowledge_reference.txt
\
--resp_ref_file
${
WOW_DATA_FOLDER
}
/output_testunseen_response_reference.txt
# We provide the following script to process the raw data from Wizard of Internet
# Processing the test dataset (test.jsonl)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
process_woi_dataset
\
--raw_file
${
WOI_DATA_FOLDER
}
/test.jsonl
\
--processed_file
${
WOI_DATA_FOLDER
}
/test_processed.txt
\
--knwl_ref_file
${
WOI_DATA_FOLDER
}
/output_test_knowledge_reference.txt
\
--resp_ref_file
${
WOI_DATA_FOLDER
}
/output_test_response_reference.txt
# Get the knowledge generation prompts for the each test dataset in WoW and WoI
MODEL_FILE
=
<PATH_OF_THE_FINETUNED_DPR_MODEL>
# WoW test seen
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
get_knwl_gen_prompts
\
--test_file
${
WOW_DATA_FOLDER
}
/testseen_processed.txt
\
--train_file
${
WOW_DATA_FOLDER
}
/train_processed.txt
\
--model_file
${
MODEL_FILE
}
\
--processed_file
${
WOW_DATA_FOLDER
}
/output_testseen_knowledge_prompts.json
\
--data_type
wow_seen
# WoW test unseen
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
get_knwl_gen_prompts
\
--test_file
${
WOW_DATA_FOLDER
}
/testunseen_processed.txt
\
--train_file
${
WOW_DATA_FOLDER
}
/train_processed.txt
\
--model_file
${
MODEL_FILE
}
\
--processed_file
${
WOW_DATA_FOLDER
}
/output_testunseen_knowledge_prompts.json
\
--data_type
wow_unseen
# WoI
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
get_knwl_gen_prompts
\
--test_file
${
WOI_DATA_FOLDER
}
/test_processed.txt
\
--train_file
${
WOW_DATA_FOLDER
}
/train_processed.txt
\
--model_file
${
MODEL_FILE
}
\
--processed_file
${
WOI_DATA_FOLDER
}
/output_test_knowledge_prompts.json
\
--data_type
woi
# Get the response generation prompts (can be applied for all the test datasets)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
get_resp_gen_prompts
\
--train_file
${
WOW_DATA_FOLDER
}
/train_processed.txt
\
--processed_file
${
WOW_DATA_FOLDER
}
/output_response_prompts.txt
examples/academic_paper_scripts/msdp/eval_knwl_generation.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
#########################
# Evaluate the F1 scores.
#########################
WORLD_SIZE
=
1
DISTRIBUTED_ARGS
=
"--nproc_per_node
$WORLD_SIZE
\
--nnodes 1
\
--node_rank 0
\
--master_addr localhost
\
--master_port 6000"
MODEL_GEN_PATH
=
<PATH_OF_THE_KNOWLEDGE_GENERATION>
\
(
e.g., /testseen_knowledge_generations.txt
)
GROUND_TRUTH_PATH
=
<PATH_OF_THE_GROUND_TRUTH_KNOWLEDGE>
\
(
e.g., /testseen_knowledge_reference.txt
)
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/msdp/main.py
\
--num-layers
24
\
--hidden-size
1024
\
--num-attention-heads
16
\
--seq-length
2048
\
--max-position-embeddings
2048
\
--micro-batch-size
4
\
--task
MSDP-EVAL-F1
\
--guess-file
${
MODEL_GEN_PATH
}
\
--answer-file
${
GROUND_TRUTH_PATH
}
############################################
# Evaluate BLEU, METEOR, and ROUGE-L scores.
############################################
# We follow the nlg-eval (https://github.com/Maluuba/nlg-eval) to
# evaluate the BLEU, METEOR, and ROUGE-L scores.
# To evaluate on these metrics, please setup the environments based on
# the nlg-eval github, and run the corresponding evaluation commands.
nlg-eval
\
--hypothesis
=
<PATH_OF_THE_KNOWLEDGE_GENERATION>
\
--references
=
<PATH_OF_THE_GROUND_TRUTH_KNOWLEDGE>
examples/academic_paper_scripts/msdp/eval_resp_generation.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
#########################
# Evaluate the F1 scores.
#########################
WORLD_SIZE
=
1
DISTRIBUTED_ARGS
=
"--nproc_per_node
$WORLD_SIZE
\
--nnodes 1
\
--node_rank 0
\
--master_addr localhost
\
--master_port 6000"
MODEL_GEN_PATH
=
<PATH_OF_THE_RESPONSE_GENERATION>
\
(
e.g., /testseen_response_generations.txt
)
GROUND_TRUTH_PATH
=
<PATH_OF_THE_GROUND_TRUTH_RESPONSE>
\
(
e.g., /testseen_response_reference.txt
)
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/msdp/main.py
\
--num-layers
24
\
--hidden-size
1024
\
--num-attention-heads
16
\
--seq-length
2048
\
--max-position-embeddings
2048
\
--micro-batch-size
4
\
--task
MSDP-EVAL-F1
\
--guess-file
${
MODEL_GEN_PATH
}
\
--answer-file
${
GROUND_TRUTH_PATH
}
##########################
# Evaluate the KF1 scores.
##########################
MODEL_GEN_PATH
=
<PATH_OF_THE_RESPONSE_GENERATION>
\
(
e.g., /testseen_response_generations.txt
)
GROUND_TRUTH_PATH
=
<PATH_OF_THE_GROUND_TRUTH_KNOWLEDGE>
\
(
e.g., /testseen_knowledge_reference.txt
)
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/msdp/main.py
\
--num-layers
24
\
--hidden-size
1024
\
--num-attention-heads
16
\
--seq-length
2048
\
--max-position-embeddings
2048
\
--micro-batch-size
4
\
--task
MSDP-EVAL-F1
\
--guess-file
${
MODEL_GEN_PATH
}
\
--answer-file
${
GROUND_TRUTH_PATH
}
############################################
# Evaluate BLEU, METEOR, and ROUGE-L scores.
############################################
# We follow the nlg-eval (https://github.com/Maluuba/nlg-eval) to
# evaluate the BLEU, METEOR, and ROUGE-L scores.
# To evaluate on these metrics, please setup the environments based on
# the nlg-eval github, and run the corresponding evaluation commands.
nlg-eval
\
--hypothesis
=
<PATH_OF_THE_RESPONSE_GENERATION>
\
--references
=
<PATH_OF_THE_GROUND_TRUTH_RESPONSE>
examples/academic_paper_scripts/msdp/prep_resp_gen.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
# Preparing the input file for the response generation (second-stage prompting)
DIR
=
`
pwd
`
TEST_FILE
=
<PATH_OF_PROCESSED_TEST_DATA>
\
(
e.g., /testseen_processed.txt
)
KNOWLEDGE_FILE
=
<PATH_OF_GENERATED_KNOWLEDGE_DATA>
\
(
e.g., /testseen_knowledge_generations.txt
)
PROCESSED_FILE
=
<PATH_OF_INPUT_FILE_FOR_RESPONSE_GENERATION>
\
(
e.g., /testseen_processed_with_generated_knowledge.txt
)
python
${
DIR
}
/tasks/msdp/preprocessing.py
\
--func
prepare_input
\
--test_file
${
TEST_FILE
}
\
--knwl_gen_file
${
KNOWLEDGE_FILE
}
\
--processed_file
${
PROCESSED_FILE
}
examples/academic_paper_scripts/msdp/prompt_knwl_gen.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
# Stage-1: Prompt a pretrained language model to generate the context-relevant knowledge
# The input contains prompts and current dialogue context, the output is the relevant knowledge
# The size of the pretrained language model is 357M
WORLD_SIZE
=
8
DISTRIBUTED_ARGS
=
"--nproc_per_node
$WORLD_SIZE
\
--nnodes 1
\
--node_rank 0
\
--master_addr localhost
\
--master_port 6000"
CHECKPOINT_PATH
=
<PATH_OF_LANGUAGE_MODEL>
(
e.g., /357m
)
VOCAB_PATH
=
<PATH_OF_VOCAB_FILE>
(
e.g., /gpt2-vocab.json
)
MERGE_PATH
=
<PATH_OF_MERGE_FILE>
(
e.g., /gpt2-merges.txt
)
INPUT_PATH
=
<PATH_OF_PROCESSED_TEST_DATA_FILE>
\
(
e.g., /testseen_processed.txt
)
PROMPT_PATH
=
<PATH_OF_KNOWLEDGE_GENERATION_PROMPTS>
\
(
e.g., /testseen_knowledge_prompts.json
)
OUTPUT_PATH
=
<PATH_OF_OUTPUT_GENERATION_FILE>
\
(
e.g., /testseen_knowledge_generations.txt
)
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/msdp/main.py
\
--num-layers
24
\
--hidden-size
1024
\
--num-attention-heads
16
\
--seq-length
2048
\
--max-position-embeddings
2048
\
--micro-batch-size
1
\
--vocab-file
${
VOCAB_PATH
}
\
--merge-file
${
MERGE_PATH
}
\
--load
${
CHECKPOINT_PATH
}
\
--fp16
\
--DDP-impl
torch
\
--tokenizer-type
GPT2BPETokenizer
\
--sample-input-file
${
INPUT_PATH
}
\
--sample-output-file
${
OUTPUT_PATH
}
\
--prompt-file
${
PROMPT_PATH
}
\
--prompt-type
knowledge
\
--num-prompt-examples
10
\
--task
MSDP-PROMPT
# NOTE: If you use api for the model generation, please use
# the "--api-prompt" flag (setting this value as True).
examples/academic_paper_scripts/msdp/prompt_resp_gen.sh
0 → 100644
View file @
523ec9cc
#!/bin/bash
# Stage-2: Prompt a pretrained language model to generate the corresponding response
# The input contains prompts, current dialogue context, and generated knowledge in Stage-1
# The output is the corresponding response.
# The size of the pretrained language model is 357M
WORLD_SIZE
=
8
DISTRIBUTED_ARGS
=
"--nproc_per_node
$WORLD_SIZE
\
--nnodes 1
\
--node_rank 0
\
--master_addr localhost
\
--master_port 6000"
CHECKPOINT_PATH
=
<PATH_OF_LANGUAGE_MODEL>
(
e.g., /357m
)
VOCAB_PATH
=
<PATH_OF_VOCAB_FILE>
(
e.g., /gpt2-vocab.json
)
MERGE_PATH
=
<PATH_OF_MERGE_FILE>
(
e.g., /gpt2-merges.txt
)
INPUT_PATH
=
<PATH_OF_INPUT_TEST_DATA_FILE>
(
e.g., /testseen_processed.txt
)
PROMPT_PATH
=
<PATH_OF_RESPONSE_GENERATION_PROMPTS>
\
(
e.g., /response_prompts.txt
)
OUTPUT_PATH
=
<PATH_OF_OUTPUT_GENERATION_FILE>
\
(
e.g., /output_testseen_response_generations.txt
)
python
-m
torch.distributed.launch
$DISTRIBUTED_ARGS
./tasks/msdp/main.py
\
--num-layers
24
\
--hidden-size
1024
\
--num-attention-heads
16
\
--seq-length
2048
\
--max-position-embeddings
2048
\
--micro-batch-size
1
\
--vocab-file
${
VOCAB_PATH
}
\
--merge-file
${
MERGE_PATH
}
\
--load
${
CHECKPOINT_PATH
}
\
--fp16
\
--DDP-impl
torch
\
--tokenizer-type
GPT2BPETokenizer
\
--sample-input-file
${
INPUT_PATH
}
\
--sample-output-file
${
OUTPUT_PATH
}
\
--prompt-file
${
PROMPT_PATH
}
\
--prompt-type
response
\
--num-prompt-examples
20
\
--task
MSDP-PROMPT
# NOTE: If you use api for the model generation, please use
# the "--api-prompt" flag (setting this value as True).
examples/academic_paper_scripts/sc21/CONFIG.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# SLURM options.
export
SLURM_PARTITION
=
<slurm partition, used to feed
-p
option
in
slurm>
export
SLURM_ACCOUNT
=
<slurm account, used to feed
-A
option
in
slurm>
# Source code.
export
MEGATRON_CODE_DIR
=
<megatron
source
code directory>
# This variable is used to mount the relevant part of the filesystem
# inside the docker container. Note that the `MEGATRON_CODE_DIR` and the
# launch directory already get mounted; this variable should be used to
# mount the directories that contain the data and tokenizer files.
export
DOCKER_MOUNT_DIR
=
<megatron dataset and bpe tokenizer vocab path>
# Data and tokenizer files.
MEGATRON_DATA
=
<path to megatron processed data>
BPE_VOCAB_FILE
=
<path to bpe vocab file>
BPE_MERGE_FILE
=
<path to bpe merges file>
# Megatron input parameters.
# `MEGATRON_EXTRA_PARAMS` can be used to provide any extra parameters
# that are not listed here.
export
MEGATRON_PARAMS
=
"
${
MEGATRON_EXTRA_PARAMS
}
\
--tensor-model-parallel-size
${
TP
}
\
--pipeline-model-parallel-size
${
PP
}
\
--micro-batch-size
${
MBS
}
\
--global-batch-size
${
GBS
}
\
--num-layers
${
NLS
}
\
--hidden-size
${
HS
}
\
--num-attention-heads
${
NAH
}
\
--DDP-impl
${
DDP
}
\
--data-path
${
MEGATRON_DATA
}
\
--vocab-file
${
BPE_VOCAB_FILE
}
\
--merge-file
${
BPE_MERGE_FILE
}
\
--log-interval 5
\
--seq-length 2048
\
--max-position-embeddings 2048
\
--train-iters 500
\
--lr-decay-iters 320
\
--lr 0.0001
\
--min-lr 0.00001
\
--lr-decay-style cosine
\
--lr-warmup-fraction 0.01
\
--split 969,30,1
\
--eval-iters 100
\
--eval-interval 1000
\
--clip-grad 1.0
\
--fp16
\
--loss-scale 8192 "
examples/academic_paper_scripts/sc21/README.md
0 → 100644
View file @
523ec9cc
# Reproducing Figures in SC21 Paper
This directory contains some of the scripts that were used to produce the
results in the
[
Megatron paper
](
https://arxiv.org/pdf/2104.04473.pdf
)
that is
to appear at
[
SuperComputing 2021
](
https://sc21.supercomputing.org/
)
. These
scripts use
[
Slurm
](
https://slurm.schedmd.com/documentation.html
)
with the
[
pyxis plugin
](
https://github.com/NVIDIA/pyxis
)
, but can be modified for other
schedulers as well.
## Git commit
To replicate these results use Megatron-LM commit: 6985e58938d40ad91ac07b0fddcfad8132e1447e
## Setup
All the cluster-dependent variables are in
[
`CONFIG.sh`
](
./CONFIG.sh
)
. Please
update the unspecified values (in angle brackets
`<...>`
) before launching any
scripts.
## Scripts
Below is a list of scripts that can be used to reproduce various figures in our
[
paper
](
https://arxiv.org/pdf/2104.04473.pdf
)
:
*
[
run_table_1.sh
](
./run_table_1.sh
)
: Table 1 showing weak-scaling throughput
for GPT models ranging from 1 billion to 1 trillion parameters.
*
[
run_figure_11.sh
](
./run_figure_11.sh
)
: Figure 11 showing the weak-scaling
performance of pipeline parallelism.
*
[
run_figure_12.sh
](
./run_figure_12.sh
)
: Figure 12 showing the effect of
the interleaved schedule on a 175B GPT model.
*
[
run_figure_13.sh
](
./run_figure_13.sh
)
: Figure 13 showing the effect of
different degrees of pipeline and tensor model parallelism on a model with
162.
2 billion parameters.
*
[
run_figure_14.sh
](
./run_figure_14.sh
)
: Figure 14 showing the effect of
different degrees of data and pipeline model parallelism on a model with
5.
9 billion parameters.
*
[
run_figure_15.sh
](
./run_figure_15.sh
)
: Figure 15 showing the effect of
different degrees of data and tensor model parallelism on a model with
5.
9 billion parameters.
*
[
run_figure_16.sh
](
./run_figure_16.sh
)
: Figure 16 showing the effect of
microbatch size.
*
[
run_figure_17.sh
](
./run_figure_17.sh
)
: Figure 17 showing the effect of
activation recomputation.
*
[
run_figure_18.sh
](
./run_figure_18.sh
)
: Figure 18 showing the effect of
the scatter-gather communication optimization.
examples/academic_paper_scripts/sc21/SBATCH.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
sbatch
-p
${
SLURM_PARTITION
}
\
-A
${
SLURM_ACCOUNT
}
\
--job-name
=
${
JOB_NAME
}
\
--nodes
=
${
NNODES
}
\
--export
=
MEGATRON_CODE_DIR,MEGATRON_PARAMS,DOCKER_MOUNT_DIR SRUN.sh
exit
0
examples/academic_paper_scripts/sc21/SRUN.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
#SBATCH -t 0:30:00 --exclusive --mem=0 --overcommit --ntasks-per-node=8
THIS_DIR
=
`
pwd
`
DATETIME
=
`
date
+
'date_%y-%m-%d_time_%H-%M-%S'
`
mkdir
-p
${
THIS_DIR
}
/logs
CMD
=
"python -u
${
MEGATRON_CODE_DIR
}
/pretrain_gpt.py
${
MEGATRON_PARAMS
}
"
srun
-l
\
--container-image
"nvcr.io#nvidia/pytorch:20.12-py3"
\
--container-mounts
"
${
THIS_DIR
}
:
${
THIS_DIR
}
,
${
MEGATRON_CODE_DIR
}
:
${
MEGATRON_CODE_DIR
}
,
${
DOCKER_MOUNT_DIR
}
:
${
DOCKER_MOUNT_DIR
}
"
\
--output
=
${
THIS_DIR
}
/logs/%x_%j_
$DATETIME
.log sh
-c
"
${
CMD
}
"
examples/academic_paper_scripts/sc21/run_figure_11.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Pipeline-parallel size options = [1, 2, 4, 8].
PP
=
1
# Batch size (global batch size) options = [8, 128].
GBS
=
8
# Set pipeline-parallel size options.
NLS
=
$((
3
*
PP
))
NNODES
=
${
PP
}
# Other params.
TP
=
8
MBS
=
1
HS
=
20480
NAH
=
128
DDP
=
local
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
# Name of the job.
export
JOB_NAME
=
results_figure_11_pipeline_parallel_size_
${
PP
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_12.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Interleaved schedule options = [YES, NO].
INTERLEAVED
=
YES
# Batch size (global batch size) options = [12, 24, 36, ..., 60].
GBS
=
12
# Set interleaved schedule options.
if
[
${
INTERLEAVED
}
==
"YES"
]
;
then
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 2 "
elif
[
${
INTERLEAVED
}
==
"NO"
]
;
then
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
else
echo
"Invalid configuration"
exit
1
fi
# Other params.
TP
=
8
PP
=
12
MBS
=
1
NLS
=
96
HS
=
12288
NAH
=
96
DDP
=
local
NNODES
=
12
# Name of the job.
export
JOB_NAME
=
results_figure_12_interleaved_
${
INTERLEAVED
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_13.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Pipeline-parallel size options = [2, 4, 8, 16, 32].
PP
=
2
# Batch size (global batch size) options = [32, 128].
GBS
=
32
# Set pipeline-parallel and tensor-parallel size options.
TP
=
$((
64
/
PP
))
# Other params.
MBS
=
1
NLS
=
32
HS
=
20480
NAH
=
128
DDP
=
local
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
NNODES
=
8
# Name of the job.
export
JOB_NAME
=
results_figure_13_pipeline_parallel_size_
${
PP
}
_tensor_parallel_size_
${
TP
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_14.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Pipeline-parallel size options = [2, 4, 8, 16, 32].
PP
=
2
# Batch size (global batch size) options = [32, 512].
GBS
=
32
# Set pipeline-parallel and data-parallel size options.
DP
=
$((
64
/
PP
))
# Other params.
TP
=
1
MBS
=
1
NLS
=
32
HS
=
3840
NAH
=
32
DDP
=
local
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
NNODES
=
8
# Name of the job.
export
JOB_NAME
=
results_figure_14_pipeline_parallel_size_
${
PP
}
_data_parallel_size_
${
DP
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_15.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Tensor-parallel size options = [2, 4, 8, 16, 32].
TP
=
2
# Batch size (global batch size) options = [32, 128, 512].
GBS
=
32
# Set tensor-parallel and data-parallel size options.
DP
=
$((
64
/
TP
))
# Other params.
PP
=
1
MBS
=
1
NLS
=
32
HS
=
3840
NAH
=
32
DDP
=
local
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
NNODES
=
8
# Name of the job.
export
JOB_NAME
=
results_figure_15_tensor_parallel_size_
${
TP
}
_data_parallel_size_
${
DP
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_16.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Microbatch size options = [1, 2, 4, 8].
MBS
=
1
# Batch size (global batch size) options = [128, 512].
GBS
=
128
# Other params.
TP
=
8
PP
=
8
NLS
=
32
HS
=
15360
NAH
=
128
DDP
=
local
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
NNODES
=
8
# Name of the job.
export
JOB_NAME
=
results_figure_16_microbatch_size_
${
MBS
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_17.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Activation recomputation options = [YES, NO].
ACTIVATION_RECOMPUTATION
=
YES
# Batch size (global batch size) options = [1, 2, 4, ..., 256].
GBS
=
1
# Set activation recomputation.
if
[
${
ACTIVATION_RECOMPUTATION
}
==
"YES"
]
;
then
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
ACTIVATION_RECOMPUTATION
}
==
"NO"
]
;
then
MEGATRON_EXTRA_PARAMS
=
""
else
echo
"Invalid configuration"
exit
1
fi
# Other params.
TP
=
8
PP
=
16
MBS
=
1
NLS
=
80
HS
=
12288
NAH
=
96
DDP
=
local
NNODES
=
16
# Name of the job.
export
JOB_NAME
=
results_figure_17_activation_recomputation_
${
ACTIVATION_RECOMPUTATION
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_figure_18.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# Scatter-gather communication optimization options = [YES, NO].
SCATTER_GATHER
=
YES
# Batch size (global batch size) options = [12, 24, 36, ..., 60].
GBS
=
12
# Set scatter-gather communication optimization options.
if
[
${
SCATTER_GATHER
}
==
"YES"
]
;
then
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 2 "
elif
[
${
SCATTER_GATHER
}
==
"NO"
]
;
then
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 2 --no-scatter-gather-tensors-in-pipeline "
else
echo
"Invalid configuration"
exit
1
fi
# Other params.
TP
=
8
PP
=
12
MBS
=
1
NLS
=
96
HS
=
12288
NAH
=
96
DDP
=
local
NNODES
=
12
# Name of the job.
export
JOB_NAME
=
results_figure_18_scatter_gather_
${
SCATTER_GATHER
}
_batch_size_
${
GBS
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/academic_paper_scripts/sc21/run_table_1.sh
0 → 100755
View file @
523ec9cc
#!/bin/bash
# ================================
# Choose the case to run.
# ================================
# model size options = [1.7B, 3.6B, 7.5B, 18B, 39B, 76B, 145B, 310B, 530B, 1T]
MODEL_SIZE
=
1.7B
if
[
${
MODEL_SIZE
}
==
"1.7B"
]
;
then
TP
=
1
PP
=
1
MBS
=
16
GBS
=
512
NLS
=
24
HS
=
2304
NAH
=
24
DDP
=
torch
NNODES
=
4
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
MODEL_SIZE
}
==
"3.6B"
]
;
then
TP
=
2
PP
=
1
MBS
=
16
GBS
=
512
NLS
=
30
HS
=
3072
NAH
=
32
DDP
=
torch
NNODES
=
8
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
MODEL_SIZE
}
==
"7.5B"
]
;
then
TP
=
4
PP
=
1
MBS
=
16
GBS
=
512
NLS
=
36
HS
=
4096
NAH
=
32
DDP
=
torch
NNODES
=
16
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
MODEL_SIZE
}
==
"18B"
]
;
then
TP
=
8
PP
=
1
MBS
=
8
GBS
=
1024
NLS
=
40
HS
=
6144
NAH
=
48
DDP
=
torch
NNODES
=
32
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
MODEL_SIZE
}
==
"39B"
]
;
then
TP
=
8
PP
=
2
MBS
=
4
GBS
=
1536
NLS
=
48
HS
=
8192
NAH
=
64
DDP
=
local
NNODES
=
64
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
elif
[
${
MODEL_SIZE
}
==
"76B"
]
;
then
TP
=
8
PP
=
4
MBS
=
2
GBS
=
1792
NLS
=
60
HS
=
10240
NAH
=
80
DDP
=
local
NNODES
=
128
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 5"
elif
[
${
MODEL_SIZE
}
==
"145B"
]
;
then
TP
=
8
PP
=
8
MBS
=
2
GBS
=
2304
NLS
=
80
HS
=
12288
NAH
=
96
DDP
=
local
NNODES
=
192
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 5 "
elif
[
${
MODEL_SIZE
}
==
"310B"
]
;
then
TP
=
8
PP
=
16
MBS
=
1
GBS
=
2160
NLS
=
96
HS
=
16384
NAH
=
128
DDP
=
local
NNODES
=
240
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 3 "
elif
[
${
MODEL_SIZE
}
==
"530B"
]
;
then
TP
=
8
PP
=
35
MBS
=
1
GBS
=
2520
NLS
=
105
HS
=
20480
NAH
=
128
DDP
=
local
NNODES
=
315
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform --num-layers-per-virtual-pipeline-stage 1 "
elif
[
${
MODEL_SIZE
}
==
"1T"
]
;
then
TP
=
8
PP
=
64
MBS
=
1
GBS
=
3072
NLS
=
128
HS
=
25600
NAH
=
160
DDP
=
local
NNODES
=
384
MEGATRON_EXTRA_PARAMS
=
"--activations-checkpoint-method uniform "
else
echo
"Invalid configuration"
exit
1
fi
# Name of the job
export
JOB_NAME
=
results_table_1_model_size_
${
MODEL_SIZE
}
# Import the configs.
.
`
pwd
`
/CONFIG.sh
# Submit the job.
.
`
pwd
`
/SBATCH.sh
exit
0
examples/bert/README.md
0 → 100644
View file @
523ec9cc
# BERT MODEL
## Table of contents
-
[
1. Training Setup
](
#1-training-setup
)
-
[
2. Configurations
](
#2-configurations
)
## 1. Training setup
<a
id=
"markdown-training-setup"
name=
"training-setup"
></a>
To run the model using a docker container run it as follows
```
PYTORCH_IMAGE=nvcr.io/nvidia/pytorch:24.01-py3
CHECKPOINT_PATH="" #<Specify path>
TENSORBOARD_LOGS_PATH=""#<Specify path>
VOCAB_FILE="" #<Specify path to file>//bert-vocab.txt
DATA_PATH="" #<Specify path and file prefix>_text_document
docker run \
--gpus=all \
--ipc=host \
--workdir /workspace/megatron-lm \
-v /path/to/data:/path/to/data \
-v /path/to/megatron-lm:/workspace/megatron-lm \
megatron-lm nvcr.io/nvidia/pytorch:24.01-py3 \
bash examples/bert/train_bert_340m_distributed.sh $CHECKPOINT_PATH $TENSORBOARD_LOGS_PATH $VOCAB_FILE $DATA_PATH "
```
NOTE: Depending on the environment you are running it the above command might like slightly different.
## 2. Configurations
<a
id=
"markdown-configurations"
name=
"configurations"
></a>
The example in this folder shows you how to run 340m large model. There are other configs you could run as well
### 4B
```
--num-layers 48 \
--hidden-size 2560 \
--num-attention-heads 32 \
--tensor-model-parallel-size 1 \
--pipeline-model-parallel-size 1 \
```
### 20B
```
--num-layers 48 \
--hidden-size 6144 \
--num-attention-heads 96 \
--tensor-model-parallel-size 4 \
--pipeline-model-parallel-size 4 \
```
\ No newline at end of file
Prev
1
2
3
4
5
6
7
…
38
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment