Unverified Commit 85752000 authored by Jeff Rasley's avatar Jeff Rasley Committed by GitHub
Browse files

add links (#56)

parent 010f6dc0
...@@ -10,7 +10,11 @@ efficient, and effective. ...@@ -10,7 +10,11 @@ efficient, and effective.
DeepSpeed can train DL models with over a hundred billion parameters on current DeepSpeed can train DL models with over a hundred billion parameters on current
generation of GPU clusters, while achieving over 5x in system performance generation of GPU clusters, while achieving over 5x in system performance
compared to the state-of-art. compared to the state-of-art. Early adopters of DeepSpeed have already produced
a language model (LM) with over 17B parameters called
[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
establishing a new SOTA in the LM category.
# Table of Contents # Table of Contents
...@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save ...@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save
significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054). 4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054).
With this impressive memory reduction, early adopters of DeepSpeed have already
produced alanguage model (LM) with over 17B parameters called
[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
establishing a new SOTA in the LM category.
## Scalability ## Scalability
DeepSpeed supports efficient data parallelism, model parallelism, and their DeepSpeed supports efficient data parallelism, model parallelism, and their
combination. ZeRO boosts the scaling capability and efficiency further. combination. ZeRO boosts the scaling capability and efficiency further.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment