Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ColossalAI
Commits
d96cc37e
Unverified
Commit
d96cc37e
authored
Dec 28, 2022
by
Jiarui Fang
Committed by
GitHub
Dec 28, 2022
Browse files
[example] update GPT example benchmark results (#2212)
parent
d5e3e3ec
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
8 additions
and
2 deletions
+8
-2
examples/language/gpt/README.md
examples/language/gpt/README.md
+8
-2
No files found.
examples/language/gpt/README.md
View file @
d96cc37e
...
...
@@ -92,11 +92,17 @@ How dose the Tensor Parallel Degree affect the efficency.
Touch the bar of model scale and batch size.
1.
`cpu`
is the most stable policy for large model and large batch size. One 8 GPU with TP=2, largest batch size of
`auto`
,
`const`
`cpu`
is 64, 32 and 16, respectively.
2.
Tensor parallel is necessary for 20B model to reduce model data memory requirement on each GPU.
| model | #GPU | policy | TP | batch per DP | Tflops |
| ---------- | --------- |--------- |--------- |--------- |--------- |
| gpt2_20b | 4 | cpu | 1 | 64 | CUDA OOM |
| gpt2_20b | 4 | auto | 1/2 | 64 | CUDA OOM |
| gpt2_20b | 4 | cpu | 2 | 64 | 121.394 |
| gpt2_20b | 4 | cpu | 2 | 8 | 43.102 |
| gpt2_20b | 4 | cpu | 2 | 64 | 121.394 |
| gpt2_20b | 8 | auto | 2 | 16 | 99.871 |
| gpt2_20b | 8 | cpu | 2 | 64 | 125.170 |
| gpt2_20b | 8 | const | 2 | 32 | 105.415 |
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment