README.md 12.6 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: A Unified Deep Learning System for Big Model Era
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
binmakeswell's avatar
binmakeswell committed
31
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
38
     <li><a href="#OPT">OPT</a></li>
39
     <li><a href="#Recommendation-System-Models">Recommendation System Models</a></li>
40
41
   </ul>
 </li>
42
 <li>
binmakeswell's avatar
binmakeswell committed
43
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a> 
44
45
46
47
48
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
49
 <li>
binmakeswell's avatar
binmakeswell committed
50
   <a href="#Inference-Energon-AI-Demo">Inference (Energon-AI) Demo</a> 
binmakeswell's avatar
binmakeswell committed
51
52
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
binmakeswell's avatar
binmakeswell committed
53
     <li><a href="#OPT-Serving">OPT-175B Online Serving for Text Generation</a></li>
binmakeswell's avatar
binmakeswell committed
54
   </ul>
55
56
 </li>
   <li>
fastalgo's avatar
fastalgo committed
57
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI for Real World Applications</a> 
58
   <ul>
binmakeswell's avatar
binmakeswell committed
59
     <li><a href="#Biomedicine">Biomedicine: Acceleration of AlphaFold Protein Structure</a></li>
60
   </ul>
binmakeswell's avatar
binmakeswell committed
61
 </li>
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
79

binmakeswell's avatar
binmakeswell committed
80
81
82
83
84
85
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

fastalgo's avatar
fastalgo committed
86
   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
binmakeswell's avatar
binmakeswell committed
87
88
89
90
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
91
92
## Features

binmakeswell's avatar
binmakeswell committed
93
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
94
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
95
distributed training and inference in a few lines.
binmakeswell's avatar
binmakeswell committed
96

Jiarui Fang's avatar
Jiarui Fang committed
97
98
99
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
100
101
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
102
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
Jiarui Fang's avatar
Jiarui Fang committed
103

fastalgo's avatar
fastalgo committed
104
- Heterogeneous Memory Management 
Jiarui Fang's avatar
Jiarui Fang committed
105
106
107
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
108
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
109

binmakeswell's avatar
binmakeswell committed
110
111
112
- Inference
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)

113
- Colossal-AI in the Real World 
binmakeswell's avatar
binmakeswell committed
114
  - Biomedicine: [FastFold](https://github.com/hpcaitech/FastFold) accelerates training and inference of AlphaFold protein structure
115
116
<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
117
## Parallel Training Demo
binmakeswell's avatar
binmakeswell committed
118
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
119
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
120
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
121
</p>
binmakeswell's avatar
binmakeswell committed
122

fastalgo's avatar
fastalgo committed
123
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
124

125
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
126
<p align="center">
Sze-qq's avatar
Sze-qq committed
127
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
128
</p>
binmakeswell's avatar
binmakeswell committed
129

fastalgo's avatar
fastalgo committed
130
- Save 50% GPU resources, and 10.7% acceleration
131
132

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
133
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
134

fastalgo's avatar
fastalgo committed
135
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
136

Sze-qq's avatar
Sze-qq committed
137
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
138

Sze-qq's avatar
Sze-qq committed
139
140
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
141
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
142
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
143

144
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
145

binmakeswell's avatar
binmakeswell committed
146
147
148
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
149
### OPT
150
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>
binmakeswell's avatar
binmakeswell committed
151
152

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because public pretrained model weights.
binmakeswell's avatar
binmakeswell committed
153
- 45% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt) [[Online Serving]](https://service.colossalai.org/opt) 
binmakeswell's avatar
binmakeswell committed
154
155

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI-Examples) for more details.
binmakeswell's avatar
binmakeswell committed
156

157
### Recommendation System Models
158
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), utilize software cache to train larger embedding tables with a smaller GPU memory budget.
159

160
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
161

binmakeswell's avatar
binmakeswell committed
162
## Single GPU Training Demo
163
164
165
166
167
168
169
170

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

Jiarui Fang's avatar
Jiarui Fang committed
171
172
173
174
175
176
<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 120x larger model size on the same hardware (RTX 3080)

177
178
179
180
181
182
183
184
185
### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
186

binmakeswell's avatar
binmakeswell committed
187
## Inference (Energon-AI) Demo
binmakeswell's avatar
binmakeswell committed
188
189
190
191
192
193
194

<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

binmakeswell's avatar
binmakeswell committed
195
196
197
198
199
200
<p id="OPT-Serving" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_serving.png" width=800/>
</p>

- [OPT Serving](https://service.colossalai.org/opt): Try 175-billion-parameter OPT online services for free, without any registration whatsoever.

binmakeswell's avatar
binmakeswell committed
201
202
<p align="right">(<a href="#top">back to top</a>)</p>

203
204
## Colossal-AI in the Real World

binmakeswell's avatar
binmakeswell committed
205
206
207
208
209
210
211
212
213
### Biomedicine
Acceleration of [AlphaFold Protein Structure](https://alphafold.ebi.ac.uk/)

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): accelerating training and inference on GPU Clusters, faster data processing, inference sequence containing more than 10000 residues.

214
215
216
217
<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

binmakeswell's avatar
binmakeswell committed
218
219
- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): accelerating structure prediction of protein monomers and multimer by 11x.

220
221
222

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
223
224
## Installation

225
### Download From Official Releases
ver217's avatar
ver217 committed
226

binmakeswell's avatar
binmakeswell committed
227
You can visit the [Download](https://www.colossalai.org/download) page to download Colossal-AI with pre-built CUDA extensions.
228

ver217's avatar
ver217 committed
229

230
### Download From Source
ver217's avatar
ver217 committed
231

232
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
233
234

```shell
235
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
236
cd ColossalAI
237

zbian's avatar
zbian committed
238
239
240
241
242
243
244
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
245
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
246
247

```shell
248
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
249
250
```

251
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
252

Frank Lee's avatar
Frank Lee committed
253
254
## Use Docker

255
256
257
258
259
260
261
### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Frank Lee's avatar
Frank Lee committed
262
263
Run the following command to build a docker image from Dockerfile provided.

264
265
266
> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.

267

Frank Lee's avatar
Frank Lee committed
268
269
270
271
272
273
274
275
276
277
278
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

279
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
280
281
282
283
284

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
285
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
286

287
288
## Contributing

binmakeswell's avatar
binmakeswell committed
289
290
291
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
292

binmakeswell's avatar
binmakeswell committed
293
294
295
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
296

297
298
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
299
300
301
302
303
## Quick View

### Start Distributed Training in Lines

```python
304
305
306
parallel = dict(
    pipeline=2,
    tensor=dict(mode='2.5d', depth = 1, size=4)
307
)
zbian's avatar
zbian committed
308
309
```

310
### Start Heterogeneous Training in Lines
zbian's avatar
zbian committed
311
312

```python
313
314
315
316
317
318
319
320
zero = dict(
    model_config=dict(
        tensor_placement_policy='auto',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=True
    ),
    optimizer_config=dict(initial_scale=2**5, gpu_margin_mem_ratio=0.2)
)
zbian's avatar
zbian committed
321
322
323

```

324
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
325

326
## Cite Us
zbian's avatar
zbian committed
327

328
329
330
331
332
333
334
335
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
336

fastalgo's avatar
fastalgo committed
337
<p align="right">(<a href="#top">back to top</a>)</p>