add links (#56)

85752000 · Jeff Rasley · GitHub · 010f6dc0 · 85752000
Unverified Commit 85752000 authored Feb 10, 2020 by Jeff Rasley Committed by GitHub Feb 10, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 1 deletion

README.md README.md +11 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -10,7 +10,11 @@ efficient, and effective.
 DeepSpeed can train DL models with over a hundred billion parameters on current
 generation of GPU clusters, while achieving over 5x in system performance
-compared to the state-of-art.
+compared to the state-of-art. Early adopters of DeepSpeed have already produced
+a language model (LM) with over 17B parameters called
+[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
+establishing a new SOTA in the LM category.
 # Table of Contents
@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save
 significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
 4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054).
+With this impressive memory reduction, early adopters of DeepSpeed have already
+produced  alanguage model (LM) with over 17B parameters called
+[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
+establishing a new SOTA in the LM category.
 ## Scalability
 DeepSpeed supports efficient data parallelism, model parallelism, and their
 combination. ZeRO boosts the scaling capability and efficiency further.