Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
deepspeed
Commits
85752000
Unverified
Commit
85752000
authored
Feb 10, 2020
by
Jeff Rasley
Committed by
GitHub
Feb 10, 2020
Browse files
add links (#56)
parent
010f6dc0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
11 additions
and
1 deletion
+11
-1
README.md
README.md
+11
-1
No files found.
README.md
View file @
85752000
...
@@ -10,7 +10,11 @@ efficient, and effective.
...
@@ -10,7 +10,11 @@ efficient, and effective.
DeepSpeed can train DL models with over a hundred billion parameters on current
DeepSpeed can train DL models with over a hundred billion parameters on current
generation of GPU clusters, while achieving over 5x in system performance
generation of GPU clusters, while achieving over 5x in system performance
compared to the state-of-art.
compared to the state-of-art. Early adopters of DeepSpeed have already produced
a language model (LM) with over 17B parameters called
[
Turing-NLG
](
https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft
)
,
establishing a new SOTA in the LM category.
# Table of Contents
# Table of Contents
...
@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save
...
@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save
significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
4x relative to the state-of-art. You can read more about ZeRO in our
[
paper
](
https://arxiv.org/abs/1910.02054
)
.
4x relative to the state-of-art. You can read more about ZeRO in our
[
paper
](
https://arxiv.org/abs/1910.02054
)
.
With this impressive memory reduction, early adopters of DeepSpeed have already
produced alanguage model (LM) with over 17B parameters called
[
Turing-NLG
](
https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft
)
,
establishing a new SOTA in the LM category.
## Scalability
## Scalability
DeepSpeed supports efficient data parallelism, model parallelism, and their
DeepSpeed supports efficient data parallelism, model parallelism, and their
combination. ZeRO boosts the scaling capability and efficiency further.
combination. ZeRO boosts the scaling capability and efficiency further.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment