".github/workflows/release_docker_after_publish.yml" did not exist on "fd90245399be12cee179a06f95ad8fc3ee835179"
README.md 10.7 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: A Unified Deep Learning System for Big Model Era
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
binmakeswell's avatar
binmakeswell committed
31
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
38
     <li><a href="#OPT">OPT</a></li>
39
40
   </ul>
 </li>
41
 <li>
binmakeswell's avatar
binmakeswell committed
42
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a> 
43
44
45
46
47
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
48
 <li>
binmakeswell's avatar
binmakeswell committed
49
   <a href="#Inference-Energon-AI-Demo">Inference (Energon-AI) Demo</a> 
binmakeswell's avatar
binmakeswell committed
50
51
52
53
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
   </ul>
 </li>
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
71

binmakeswell's avatar
binmakeswell committed
72
73
74
75
76
77
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

fastalgo's avatar
fastalgo committed
78
   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
binmakeswell's avatar
binmakeswell committed
79
80
81
82
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
83
84
## Features

binmakeswell's avatar
binmakeswell committed
85
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
86
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
87
distributed training and inference in a few lines.
binmakeswell's avatar
binmakeswell committed
88

Jiarui Fang's avatar
Jiarui Fang committed
89
90
91
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
92
93
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
94
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
Jiarui Fang's avatar
Jiarui Fang committed
95

fastalgo's avatar
fastalgo committed
96
- Heterogeneous Memory Management 
Jiarui Fang's avatar
Jiarui Fang committed
97
98
99
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
100
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
101

binmakeswell's avatar
binmakeswell committed
102
103
104
- Inference
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)

105
106
<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
107
## Parallel Training Demo
binmakeswell's avatar
binmakeswell committed
108
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
109
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
110
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
111
</p>
binmakeswell's avatar
binmakeswell committed
112

fastalgo's avatar
fastalgo committed
113
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
114

115
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
116
<p align="center">
Sze-qq's avatar
Sze-qq committed
117
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
118
</p>
binmakeswell's avatar
binmakeswell committed
119

fastalgo's avatar
fastalgo committed
120
- Save 50% GPU resources, and 10.7% acceleration
121
122

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
123
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
124

fastalgo's avatar
fastalgo committed
125
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
126

Sze-qq's avatar
Sze-qq committed
127
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
128

Sze-qq's avatar
Sze-qq committed
129
130
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
131
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
132
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
133

134
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
135

binmakeswell's avatar
binmakeswell committed
136
137
138
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
139
140
141
142
143
144
145
### OPT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT.png" width=800/>

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because public pretrained model weights.
- 40% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt)

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI-Examples) for more details.
binmakeswell's avatar
binmakeswell committed
146

147
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
148

binmakeswell's avatar
binmakeswell committed
149
## Single GPU Training Demo
150
151
152
153
154
155
156
157

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

Jiarui Fang's avatar
Jiarui Fang committed
158
159
160
161
162
163
<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 120x larger model size on the same hardware (RTX 3080)

164
165
166
167
168
169
170
171
172
### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
173

binmakeswell's avatar
binmakeswell committed
174
## Inference (Energon-AI) Demo
binmakeswell's avatar
binmakeswell committed
175
176
177
178
179
180
181
182
183
184

### GPT-3
<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
185
186
## Installation

187
### Download From Official Releases
ver217's avatar
ver217 committed
188

binmakeswell's avatar
binmakeswell committed
189
You can visit the [Download](https://www.colossalai.org/download) page to download Colossal-AI with pre-built CUDA extensions.
190

ver217's avatar
ver217 committed
191

192
### Download From Source
ver217's avatar
ver217 committed
193

194
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
195
196

```shell
197
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
198
cd ColossalAI
199

zbian's avatar
zbian committed
200
201
202
203
204
205
206
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
207
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
208
209

```shell
210
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
211
212
```

213
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
214

Frank Lee's avatar
Frank Lee committed
215
216
## Use Docker

217
218
219
220
221
222
223
### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Frank Lee's avatar
Frank Lee committed
224
225
Run the following command to build a docker image from Dockerfile provided.

226
227
228
> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.

229

Frank Lee's avatar
Frank Lee committed
230
231
232
233
234
235
236
237
238
239
240
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

241
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
242
243
244
245
246

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
247
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
248

249
250
## Contributing

binmakeswell's avatar
binmakeswell committed
251
252
253
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
254

binmakeswell's avatar
binmakeswell committed
255
256
257
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
258

259
260
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
261
262
263
264
265
## Quick View

### Start Distributed Training in Lines

```python
266
267
268
parallel = dict(
    pipeline=2,
    tensor=dict(mode='2.5d', depth = 1, size=4)
269
)
zbian's avatar
zbian committed
270
271
```

272
### Start Heterogeneous Training in Lines
zbian's avatar
zbian committed
273
274

```python
275
276
277
278
279
280
281
282
zero = dict(
    model_config=dict(
        tensor_placement_policy='auto',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=True
    ),
    optimizer_config=dict(initial_scale=2**5, gpu_margin_mem_ratio=0.2)
)
zbian's avatar
zbian committed
283
284
285

```

286
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
287

288
## Cite Us
zbian's avatar
zbian committed
289

290
291
292
293
294
295
296
297
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
298

fastalgo's avatar
fastalgo committed
299
<p align="right">(<a href="#top">back to top</a>)</p>