News edits (#220)

* BERT title

News edits (#220)
* BERT title
4eade17a · Shaden Smith · GitHub · 0c824830 · 4eade17a · 4eade17a
Unverified Commit 4eade17a authored May 19, 2020 by Shaden Smith Committed by GitHub May 19, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 13 deletions

README.md README.md +1 -1

docs/_posts/2020-05-19-bert-record.md docs/_posts/2020-05-19-bert-record.md +12 -12

No files found.
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ establishing a new SOTA in the LM category.
 # News
 * [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
 <span style="color:dodgerblue">**[_NEW_]**</span>
-* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve the world’s and most efficient fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
+* [2020/05/19] [The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
 <span style="color:dodgerblue">**[_NEW_]**</span>
 * [2020/02/13] [Turing-NLG: A 17-billion-parameter language model by Microsoft](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/)
 * [2020/02/13] [ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/)

--- a/docs/_posts/2020-05-19-bert-record.md
+++ b/docs/_posts/2020-05-19-bert-record.md
 ---
 layout: single
-title: "DeepSpeed optimizes transformer kernels to achieve the world's fastest and most efficient BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
+title: "The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels"
 excerpt: ""
 categories: news
 new_post: true
 date: 2020-05-19 00:00:00
 ---
-We introduce new technology to accelerate single GPU performance via
+We introduce new technology to accelerate single GPU performance via kernel
-kernel optimizations. These optimizations not only create a strong
+optimizations. These optimizations not only create a strong foundation for
-foundation for scaling out large models, but also improve the single GPU
+scaling out large models, but also improve the single GPU performance of
-performance of highly tuned and moderately sized models like BERT by more
+highly tuned and moderately sized models like BERT by more than 30%, reaching
-than 30%, reaching a staggering performance of 66 teraflops per V100 GPU,
+a staggering performance of 66 teraflops per V100 GPU, which is 52% of the
-which is 52% of the hardware peak. **Using these optimizations as the building
+hardware peak. **Using optimized transformer kernels as the building block,
-block, DeepSpeed achieves the fastest BERT training record: 44 minutes on
+DeepSpeed achieves the fastest BERT training record: 44 minutes on 1,024
-1,024 NVIDIA V100 GPUs**, compared with the best published result
+NVIDIA V100 GPUs**, compared with the best published result of 67 minutes on
-of 67 minutes on the same number and generation of GPUs.
+the same number and generation of GPUs.
-**Code and tutorials are coming soon!**
 For a technical overview, see our [blog post](linklink).
+**Code and tutorials are coming soon!**