"git@developer.sourcefind.cn:zhaoyu6/sglang.git" did not exist on "b0890631a011be28d5ef5a0b4d5551fdeb94ab25"
2020-05-19-bert-record.md 983 Bytes
Newer Older
Jeff Rasley's avatar
Jeff Rasley committed
1
2
---
layout: single
Shaden Smith's avatar
Shaden Smith committed
3
title: "DeepSpeed optimizes transformer kernels to achieve the world's fastest and most efficient BERT training record: 44 minutes on 1024 NVIDIA V100 GPUs"
Jeff Rasley's avatar
Jeff Rasley committed
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
excerpt: ""
categories: news
new_post: true
date: 2020-05-19 00:00:00
---

We introduce new technology to accelerate single GPU performance via
kernel optimizations. These optimizations not only create a strong
foundation for scaling out large models, but also improve the single GPU
performance of highly tuned and moderately sized models like BERT by more
than 30%, reaching a staggering performance of 66 teraflops per V100 GPU,
which is 52% of the hardware peak. **Using these optimizations as the building
block, DeepSpeed achieves the fastest BERT training record: 44 minutes on
1,024 NVIDIA V100 GPUs**, compared with the best published result
of 67 minutes on the same number and generation of GPUs.

**Code and tutorials are coming soon!**
Shaden Smith's avatar
Shaden Smith committed
21
22

For a technical overview, see our [blog post](linklink).