Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
GPT2_pytorch
Commits
56325238
Commit
56325238
authored
Jun 16, 2023
by
hepj987
Browse files
v1.0版本
parent
5b30acdf
Pipeline
#361
canceled with stage
Changes
2
Pipelines
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
6 deletions
+13
-6
README.md
README.md
+9
-1
single-16B.sh
single-16B.sh
+4
-5
No files found.
README.md
View file @
56325238
...
...
@@ -126,7 +126,7 @@ SAVE_INTERVAL 保存频率
| 卡数 | 性能(samples per second) | 收敛性lm loss value | 收敛性lm loss PPL |
| :-------: | :------------------------: | :-----------------: | :---------------: |
|
16
x 4DCU | 2.
540
|
6.601086
E+00 | 7.358
93
7E+0
2
|
|
32
x 4DCU | 2.
449
|
4.299443
E+00 | 7.3
6
58
7
7E+0
1
|
...
...
@@ -204,3 +204,11 @@ sh run-inf.sh(这里以单节点小模型为例)


## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/gpt2-pytorch/
## 参考
https://github.com/bigscience-workshop/Megatron-DeepSpeed
single-16B.sh
View file @
56325238
...
...
@@ -10,7 +10,7 @@ RANK=$OMPI_COMM_WORLD_RANK
WORLD_SIZE
=
$OMPI_COMM_WORLD_SIZE
MODEL_NAME
=
gpt2-oscar_16B-
4
tp
MODEL_NAME
=
gpt2-oscar_16B-
8
tp
DATA_OUTPUT_PATH
=
./
LOGS_PATH
=
$DATA_OUTPUT_PATH
/logs
CHECKPOINT_PATH
=
output-module/
$MODEL_NAME
...
...
@@ -20,7 +20,7 @@ TENSORBOARD_PATH=output_dir/tensorboard/$MODEL_NAME
CODECARBON_PATH
=
output_dir/codecarbon/
$MODEL_NAME
TP_SIZE
=
4
# always fixed to the size of a single node
PP_SIZE
=
4
# NLAYERS must be a multiple of PP_SIZE here
PP_SIZE
=
8
# NLAYERS must be a multiple of PP_SIZE here
MICRO_BATCH_SIZE
=
1
...
...
@@ -53,12 +53,11 @@ GPT_ARGS=" \
--max-position-embeddings
$SEQ_LEN
\
--micro-batch-size
$MICRO_BATCH_SIZE
\
--global-batch-size
$GLOBAL_BATCH_SIZE
\
--train
-samples 378259
0
\
--train
_iters 700
0
\
--loss-scale 12
\
--vocab-file gpt2-vocab.json
\
--merge-file gpt2-merges.txt
\
--clip-grad 1.0
\
--fp16
\
--checkpoint-activations
\
--seed 42
$OPTIMIZER_ARGS
\
...
...
@@ -93,7 +92,7 @@ cat <<EOT > $config_json
"stage":
$ZERO_STAGE
},
"fp16": {
"enabled":
tru
e,
"enabled":
fals
e,
"loss_scale": 0,
"loss_scale_window": 500,
"hysteresis": 2,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment