README.md 12.7 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: A Unified Deep Learning System for Big Model Era
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
binmakeswell's avatar
binmakeswell committed
31
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
38
     <li><a href="#OPT">OPT</a></li>
39
     <li><a href="#Recommendation-System-Models">Recommendation System Models</a></li>
40
41
   </ul>
 </li>
42
 <li>
binmakeswell's avatar
binmakeswell committed
43
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a> 
44
45
46
47
48
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
49
 <li>
binmakeswell's avatar
binmakeswell committed
50
   <a href="#Inference-Energon-AI-Demo">Inference (Energon-AI) Demo</a> 
binmakeswell's avatar
binmakeswell committed
51
52
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
binmakeswell's avatar
binmakeswell committed
53
     <li><a href="#OPT-Serving">OPT-175B Online Serving for Text Generation</a></li>
binmakeswell's avatar
binmakeswell committed
54
   </ul>
55
56
 </li>
   <li>
fastalgo's avatar
fastalgo committed
57
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI for Real World Applications</a> 
58
   <ul>
binmakeswell's avatar
binmakeswell committed
59
     <li><a href="#AIGC">AIGC: Acceleration of Stable Diffusion</a></li>
binmakeswell's avatar
binmakeswell committed
60
     <li><a href="#Biomedicine">Biomedicine: Acceleration of AlphaFold Protein Structure</a></li>
61
   </ul>
binmakeswell's avatar
binmakeswell committed
62
 </li>
63
64
65
66
67
68
69
70
71
72
73
74
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
75

binmakeswell's avatar
binmakeswell committed
76
77
78
79
80
81
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

fastalgo's avatar
fastalgo committed
82
   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
binmakeswell's avatar
binmakeswell committed
83
84
85
86
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
87
88
## Features

binmakeswell's avatar
binmakeswell committed
89
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
90
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
91
distributed training and inference in a few lines.
binmakeswell's avatar
binmakeswell committed
92

Jiarui Fang's avatar
Jiarui Fang committed
93
94
95
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
96
97
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
98
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
Jiarui Fang's avatar
Jiarui Fang committed
99

fastalgo's avatar
fastalgo committed
100
- Heterogeneous Memory Management 
Jiarui Fang's avatar
Jiarui Fang committed
101
102
103
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
104
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
105

binmakeswell's avatar
binmakeswell committed
106
107
108
- Inference
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)

109
- Colossal-AI in the Real World 
binmakeswell's avatar
binmakeswell committed
110
  - Biomedicine: [FastFold](https://github.com/hpcaitech/FastFold) accelerates training and inference of AlphaFold protein structure
111
112
<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
113
## Parallel Training Demo
binmakeswell's avatar
binmakeswell committed
114
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
115
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
116
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
117
</p>
binmakeswell's avatar
binmakeswell committed
118

fastalgo's avatar
fastalgo committed
119
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
120

121
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
122
<p align="center">
Sze-qq's avatar
Sze-qq committed
123
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
124
</p>
binmakeswell's avatar
binmakeswell committed
125

fastalgo's avatar
fastalgo committed
126
- Save 50% GPU resources, and 10.7% acceleration
127
128

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
129
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
130

fastalgo's avatar
fastalgo committed
131
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
132

Sze-qq's avatar
Sze-qq committed
133
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
134

Sze-qq's avatar
Sze-qq committed
135
136
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
137
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
138
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
139

140
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
141

binmakeswell's avatar
binmakeswell committed
142
143
144
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
145
### OPT
146
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>
binmakeswell's avatar
binmakeswell committed
147
148

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because public pretrained model weights.
binmakeswell's avatar
binmakeswell committed
149
- 45% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt) [[Online Serving]](https://service.colossalai.org/opt) 
binmakeswell's avatar
binmakeswell committed
150
151

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI-Examples) for more details.
binmakeswell's avatar
binmakeswell committed
152

153
### Recommendation System Models
154
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), utilize software cache to train larger embedding tables with a smaller GPU memory budget.
155

156
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
157

binmakeswell's avatar
binmakeswell committed
158
## Single GPU Training Demo
159
160
161
162
163
164
165
166

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

Jiarui Fang's avatar
Jiarui Fang committed
167
168
169
170
171
172
<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 120x larger model size on the same hardware (RTX 3080)

173
174
175
176
177
178
179
180
181
### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
182

binmakeswell's avatar
binmakeswell committed
183
## Inference (Energon-AI) Demo
binmakeswell's avatar
binmakeswell committed
184
185
186
187
188
189
190

<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

binmakeswell's avatar
binmakeswell committed
191
192
193
194
195
196
<p id="OPT-Serving" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_serving.png" width=800/>
</p>

- [OPT Serving](https://service.colossalai.org/opt): Try 175-billion-parameter OPT online services for free, without any registration whatsoever.

binmakeswell's avatar
binmakeswell committed
197
198
<p align="right">(<a href="#top">back to top</a>)</p>

199
200
## Colossal-AI in the Real World

binmakeswell's avatar
binmakeswell committed
201
202
203
204
205
206
207
208
209
210
211
212
213
214
### AIGC
Acceleration of AIGC (AI-Generated Content) models such as [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
<p id="diffusion_train" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/diffusion_train.png" width=800/>
</p>

- [Stable Diffusion with Colossal-AI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): 6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper (from RTX3090/4090 to RTX3050/2070)

<p id="diffusion_demo" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/diffusion_demo.png" width=800/>
</p>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
215
216
217
218
219
220
221
222
223
### Biomedicine
Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.

224
225
226
227
<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

binmakeswell's avatar
binmakeswell committed
228
229
- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.

230
231
232

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
233
234
## Installation

235
### Download From Official Releases
ver217's avatar
ver217 committed
236

binmakeswell's avatar
binmakeswell committed
237
You can visit the [Download](https://www.colossalai.org/download) page to download Colossal-AI with pre-built CUDA extensions.
238

ver217's avatar
ver217 committed
239

240
### Download From Source
ver217's avatar
ver217 committed
241

242
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
243
244

```shell
245
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
246
cd ColossalAI
247

zbian's avatar
zbian committed
248
249
250
251
252
253
254
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
255
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
256
257

```shell
258
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
259
260
```

261
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
262

Frank Lee's avatar
Frank Lee committed
263
264
## Use Docker

265
266
267
268
269
270
271
### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Frank Lee's avatar
Frank Lee committed
272
273
Run the following command to build a docker image from Dockerfile provided.

274
275
276
> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.

277

Frank Lee's avatar
Frank Lee committed
278
279
280
281
282
283
284
285
286
287
288
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

289
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
290
291
292
293
294

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
295
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
296

297
298
## Contributing

binmakeswell's avatar
binmakeswell committed
299
300
301
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
302

binmakeswell's avatar
binmakeswell committed
303
304
305
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
306

307
308
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
309

310
## Cite Us
zbian's avatar
zbian committed
311

312
313
314
315
316
317
318
319
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
320

fastalgo's avatar
fastalgo committed
321
<p align="right">(<a href="#top">back to top</a>)</p>