Unverified Commit 0c824830 authored by Shaden Smith's avatar Shaden Smith Committed by GitHub
Browse files

news edits (#219)

parent 10a46a14
......@@ -20,9 +20,9 @@ establishing a new SOTA in the LM category.
# News
* [2020/05/19] [ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
* [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
<span style="color:dodgerblue">**[_NEW_]**</span>
* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve world’s fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
* [2020/05/19] [DeepSpeed optimizes transformer kernels to achieve the world’s and most efficient fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
<span style="color:dodgerblue">**[_NEW_]**</span>
* [2020/02/13] [Turing-NLG: A 17-billion-parameter language model by Microsoft](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/)
* [2020/02/13] [ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters](https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/)
......
---
layout: single
title: "DeepSpeed optimizes transformer kernels to achieve world's fastest BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
title: "DeepSpeed optimizes transformer kernels to achieve the world's fastest and most efficient BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
excerpt: ""
categories: news
new_post: true
date: 2020-05-19 00:00:00
---
We introduce new technology to accelerate single GPU performance via
kernel optimizations. These optimizations not only create a strong
foundation for scaling out large models, but also improve the single GPU
......@@ -19,3 +18,5 @@ block, DeepSpeed achieves the fastest BERT training record: 44 minutes on
of 67 minutes on the same number and generation of GPUs.
**Code and tutorials are coming soon!**
For a technical overview, see our [blog post](linklink).
---
layout: single
title: "ZeRO-2 empowers training models as large as 170 billion parameters up to 10x faster compared to state-of-the-art"
title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2"
excerpt: ""
categories: news
new_post: true
......@@ -17,6 +17,8 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows
training models as large as 170 billion parameters up to 10x faster compared
to state of the art.
For more information on using ZeRO-2, see the [Megatron tutorial](/tutorials/megatron/).
For more information on ZeRO-2 overview, see our [blog post](linklink).
For a technical deep dive, see our [technical report](https://arxiv.org/abs/1910.02054).
For more information on how to use ZeRO-2, see an example of training GPT family of models in this [tutorial](/tutorials/megatron/).
For a technical overview, see our [technical report](https://arxiv.org/abs/1910.02054).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment