@@ -20,6 +20,9 @@ establishing a new SOTA in the LM category.
...
@@ -20,6 +20,9 @@ establishing a new SOTA in the LM category.
# News
# News
*[2020/05/19] [ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/)
<spanstyle="color:dodgerblue">**[_NEW_]**</span>
*[2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
*[2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/19/zero-stage2.html)
<spanstyle="color:dodgerblue">**[_NEW_]**</span>
<spanstyle="color:dodgerblue">**[_NEW_]**</span>
*[2020/05/19] [The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
*[2020/05/19] [The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels](https://www.deepspeed.ai/news/2020/05/19/bert-record.html)
@@ -17,6 +17,6 @@ DeepSpeed achieves the fastest BERT training record: 44 minutes on 1,024
...
@@ -17,6 +17,6 @@ DeepSpeed achieves the fastest BERT training record: 44 minutes on 1,024
NVIDIA V100 GPUs**, compared with the best published result of 67 minutes on
NVIDIA V100 GPUs**, compared with the best published result of 67 minutes on
the same number and generation of GPUs.
the same number and generation of GPUs.
For a technical overview, see our [blog post](linklink).
For a technical overview, see our [blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/).
@@ -17,7 +17,7 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows
...
@@ -17,7 +17,7 @@ learning training by an order of magnitude. More concretely, ZeRO-2 allows
training models as large as 170 billion parameters up to 10x faster compared
training models as large as 170 billion parameters up to 10x faster compared
to state of the art.
to state of the art.
For more information on ZeRO-2 overview, see our [blog post](linklink).
For more information on ZeRO-2, see our [blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/).
For more information on how to use ZeRO-2, see an example of training GPT family of models in this [tutorial](/tutorials/megatron/).
For more information on how to use ZeRO-2, see an example of training GPT family of models in this [tutorial](/tutorials/megatron/).