Unverified Commit 0c824830 authored by Shaden Smith's avatar Shaden Smith Committed by GitHub
Browse files

news edits (#219)

parent 10a46a14
...@@ -20,9 +20,9 @@ establishing a new SOTA in the LM category. ...@@ -20,9 +20,9 @@ establishing a new SOTA in the LM category.
# News # News
* [2020/05/19] [ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html) * [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
<span style="color:dodgerblue">**[_NEW_]**</span> <span style="color:dodgerblue">**[_NEW_]**</span>
* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve world’s fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html) * [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve the world’s and most efficient fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
<span style="color:dodgerblue">**[_NEW_]**</span> <span style="color:dodgerblue">**[_NEW_]**</span>
* [2020/02/13] [Turing-NLG: A 17-billion-parameter language model by Microsoft](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/) * [2020/02/13] [Turing-NLG: A 17-billion-parameter language model by Microsoft](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/)
* [2020/02/13] [ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/) * [2020/02/13] [ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/)
......
--- ---
layout: single layout: single
title: "DeepSpeed optimizes transformer kernels to achieve world's fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs" title: "DeepSpeed optimizes transformer kernels to achieve the world's fastest and most efficient BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
excerpt: "" excerpt: ""
categories: news categories: news
new_post: true new_post: true
date: 2020-05-19 00:00:00 date: 2020-05-19 00:00:00
--- ---
We introduce new technology to accelerate single GPU performance via We introduce new technology to accelerate single GPU performance via
kernel optimizations. These optimizations not only create a strong kernel optimizations. These optimizations not only create a strong
foundation for scaling out large models, but also improve the single GPU foundation for scaling out large models, but also improve the single GPU
...@@ -19,3 +18,5 @@ block, DeepSpeed achieves the fastest BERT training record: 44 minutes on ...@@ -19,3 +18,5 @@ block, DeepSpeed achieves the fastest BERT training record: 44 minutes on
of 67 minutes on the same number and generation of GPUs. of 67 minutes on the same number and generation of GPUs.
**Code and tutorials are coming soon!** **Code and tutorials are coming soon!**
For a technical overview, see our [blog post](linklink).
--- ---
layout: single layout: single
title: "ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art" title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2"
excerpt: "" excerpt: ""
categories: news categories: news
new_post: true new_post: true
...@@ -17,6 +17,8 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows ...@@ -17,6 +17,8 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows
training models as large as 170 billion parameters up to 10x faster compared training models as large as 170 billion parameters up to 10x faster compared
to state of the art. to state of the art.
For more information on using ZeRO-2, see the [Megatron tutorial](/tutorials/megatron/). For more information on ZeRO-2 overview, see our [blog post](linklink).
For a technical deep dive, see our [technical report](https://arxiv.org/abs/1910.02054). For more information on how to use ZeRO-2, see an example of training GPT family of models in this [tutorial](/tutorials/megatron/).
For a technical overview, see our [technical report](https://arxiv.org/abs/1910.02054).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment