README.md 9.87 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: A Unified Deep Learning System for Big Model Era
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
binmakeswell's avatar
binmakeswell committed
31
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
38
39
   </ul>
 </li>
40
 <li>
binmakeswell's avatar
binmakeswell committed
41
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a> 
42
43
44
45
46
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
47
 <li>
binmakeswell's avatar
binmakeswell committed
48
   <a href="#Inference-Energon-AI-Demo">Inference (Energon-AI) Demo</a> 
binmakeswell's avatar
binmakeswell committed
49
50
51
52
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
   </ul>
 </li>
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
70

binmakeswell's avatar
binmakeswell committed
71
72
73
74
75
76
77
78
79
80
81
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   Prof. James Demmel (UC Berkeley): Colossal-AI makes distributed training efficient, easy and scalable.
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
82
83
## Features

binmakeswell's avatar
binmakeswell committed
84
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
85
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
86
distributed training and inference in a few lines.
binmakeswell's avatar
binmakeswell committed
87

Jiarui Fang's avatar
Jiarui Fang committed
88
89
90
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
91
92
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
93
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
Jiarui Fang's avatar
Jiarui Fang committed
94

fastalgo's avatar
fastalgo committed
95
- Heterogeneous Memory Management 
Jiarui Fang's avatar
Jiarui Fang committed
96
97
98
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
99
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
100

binmakeswell's avatar
binmakeswell committed
101
102
103
- Inference
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)

104
105
<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
106
## Parallel Training Demo
binmakeswell's avatar
binmakeswell committed
107
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
108
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
109
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
110
</p>
binmakeswell's avatar
binmakeswell committed
111

fastalgo's avatar
fastalgo committed
112
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
113

114
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
115
<p align="center">
Sze-qq's avatar
Sze-qq committed
116
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
117
</p>
binmakeswell's avatar
binmakeswell committed
118

fastalgo's avatar
fastalgo committed
119
- Save 50% GPU resources, and 10.7% acceleration
120
121

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
122
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
123

fastalgo's avatar
fastalgo committed
124
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
125

Sze-qq's avatar
Sze-qq committed
126
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
127

Sze-qq's avatar
Sze-qq committed
128
129
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
130
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
131
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
132

133
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
134

binmakeswell's avatar
binmakeswell committed
135
136
137
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
138
139
Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.

140
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
141

binmakeswell's avatar
binmakeswell committed
142
## Single GPU Training Demo
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
160

binmakeswell's avatar
binmakeswell committed
161
## Inference (Energon-AI) Demo
binmakeswell's avatar
binmakeswell committed
162
163
164
165
166
167
168
169
170
171

### GPT-3
<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
172
173
## Installation

174
### Download From Official Releases
ver217's avatar
ver217 committed
175

binmakeswell's avatar
binmakeswell committed
176
You can visit the [Download](https://www.colossalai.org/download) page to download Colossal-AI with pre-built CUDA extensions.
177

ver217's avatar
ver217 committed
178

179
### Download From Source
ver217's avatar
ver217 committed
180

181
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
182
183

```shell
184
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
185
cd ColossalAI
186

zbian's avatar
zbian committed
187
188
189
190
191
192
193
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
194
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
195
196

```shell
197
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
198
199
```

200
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
201

Frank Lee's avatar
Frank Lee committed
202
203
## Use Docker

204
205
206
207
208
209
210
### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Frank Lee's avatar
Frank Lee committed
211
212
Run the following command to build a docker image from Dockerfile provided.

213
214
215
> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.

216

Frank Lee's avatar
Frank Lee committed
217
218
219
220
221
222
223
224
225
226
227
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

228
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
229
230
231
232
233

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
234
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
235

236
237
## Contributing

binmakeswell's avatar
binmakeswell committed
238
239
240
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
241

binmakeswell's avatar
binmakeswell committed
242
243
244
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
245

246
247
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
248
249
250
251
252
## Quick View

### Start Distributed Training in Lines

```python
253
254
255
parallel = dict(
    pipeline=2,
    tensor=dict(mode='2.5d', depth = 1, size=4)
256
)
zbian's avatar
zbian committed
257
258
```

259
### Start Heterogeneous Training in Lines
zbian's avatar
zbian committed
260
261

```python
262
263
264
265
266
267
268
269
zero = dict(
    model_config=dict(
        tensor_placement_policy='auto',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=True
    ),
    optimizer_config=dict(initial_scale=2**5, gpu_margin_mem_ratio=0.2)
)
zbian's avatar
zbian committed
270
271
272

```

273
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
274

275
## Cite Us
zbian's avatar
zbian committed
276

277
278
279
280
281
282
283
284
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
285

fastalgo's avatar
fastalgo committed
286
<p align="right">(<a href="#top">back to top</a>)</p>