news edits (#219)

0c824830 · Shaden Smith · GitHub · 10a46a14 · 0c824830 · 0c824830
Unverified Commit 0c824830 authored May 19, 2020 by Shaden Smith Committed by GitHub May 19, 2020
Showing with 10 additions and 7 deletions

README.md README.md +2 -2

docs/_posts/2020-05-19-bert-record.md docs/_posts/2020-05-19-bert-record.md +3 -2

docs/_posts/2020-05-19-zero-stage2.md docs/_posts/2020-05-19-zero-stage2.md +5 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -20,9 +20,9 @@ establishing a new SOTA in the LM category.
 # News
-* [2020/05/19] [ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
+* [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
 <span style="color:dodgerblue">**[_NEW_]**</span>
-* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve world’s fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
+* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve the world’s and most efficient fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
 <span style="color:dodgerblue">**[_NEW_]**</span>
 * [2020/02/13] [Turing-NLG: A 17-billion-parameter language model by Microsoft](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/)
 * [2020/02/13] [ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/)

--- a/docs/_posts/2020-05-19-bert-record.md
+++ b/docs/_posts/2020-05-19-bert-record.md
 ---
 layout: single
-title: "DeepSpeed optimizes transformer kernels to achieve world's fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
+title: "DeepSpeed optimizes transformer kernels to achieve the world's fastest and most efficient BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
 excerpt: ""
 categories: news
 new_post: true
 date: 2020-05-19 00:00:00
 ---
 We introduce new technology to accelerate single GPU performance via
 kernel optimizations. These optimizations not only create a strong
 foundation for scaling out large models, but also improve the single GPU
@@ -19,3 +18,5 @@ block, DeepSpeed achieves the fastest BERT training record: 44 minutes on
 of 67 minutes on the same number and generation of GPUs.
 **Code and tutorials are coming soon!**
+For a technical overview, see our [blog post](linklink).
--- a/docs/_posts/2020-05-19-zero-stage2.md
+++ b/docs/_posts/2020-05-19-zero-stage2.md
 ---
 layout: single
-title: "ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art"
+title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2"
 excerpt: ""
 categories: news
 new_post: true
@@ -17,6 +17,8 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows
 training models as large as 170 billion parameters up to 10x faster compared
 to state of the art.
-For more information on using ZeRO-2, see the [Megatron tutorial](/tutorials/megatron/).
+For more information on ZeRO-2 overview, see our [blog post](linklink).
-For a technical deep dive, see our [technical report](https://arxiv.org/abs/1910.02054).
+For more information on how to use ZeRO-2, see an example of training GPT family of models in this [tutorial](/tutorials/megatron/).
+For a technical overview, see our [technical report](https://arxiv.org/abs/1910.02054).