README.md 8.91 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
6
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

   An integrated large-scale model training system with efficient parallelization techniques.
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
28
29
30
31
32
33
34
35
## Table of Contents
<ul>
 <li><a href="#Features">Features</a> </li>
 <li>
   <a href="#Demo">Demo</a> 
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
36
     <li><a href="#PaLM">BERT</a></li>
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
   </ul>
 </li>

 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
57
58
59
60

## Features

Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
61
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
62
63
64
65
66
67
68
69
70
71
72
distributed training in a few lines.

- Data Parallelism
- Pipeline Parallelism
- 1D, 2D, 2.5D, 3D tensor parallelism
- Sequence parallelism
- Friendly trainer and engine
- Extensible for new parallelism
- Mixed Precision Training
- Zero Redundancy Optimizer (ZeRO)

73
74
75
<p align="right">(<a href="#top">back to top</a>)</p>

## Demo
binmakeswell's avatar
binmakeswell committed
76
### ViT
Shen Chenhui's avatar
Shen Chenhui committed
77
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
binmakeswell's avatar
binmakeswell committed
78

fastalgo's avatar
fastalgo committed
79
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
80

81
### GPT-3
Shen Chenhui's avatar
Shen Chenhui committed
82
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3.png" width=700/>
binmakeswell's avatar
binmakeswell committed
83

fastalgo's avatar
fastalgo committed
84
- Save 50% GPU resources, and 10.7% acceleration
85
86

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
87
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
88

fastalgo's avatar
fastalgo committed
89
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
90

Sze-qq's avatar
Sze-qq committed
91
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
92

Sze-qq's avatar
Sze-qq committed
93
94
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
95
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
96
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
97

98
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
99

binmakeswell's avatar
binmakeswell committed
100
101
102
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
103
104
Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.

105
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
106

zbian's avatar
zbian committed
107
108
## Installation

ver217's avatar
ver217 committed
109
110
111
112
113
114
### PyPI

```bash
pip install colossalai
```
This command will install CUDA extension if your have installed CUDA, NVCC and torch. 
115

ver217's avatar
ver217 committed
116
117
118
119
120
121
122
123
124
125
126
127
If you don't want to install CUDA extension, you should add `--global-option="--no_cuda_ext"`, like:
```bash
pip install colossalai --global-option="--no_cuda_ext"
```

If you want to use `ZeRO`, you can run:
```bash
pip install colossalai[zero]
```

### Install From Source

fastalgo's avatar
fastalgo committed
128
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to create an issue if you encounter any problems. :-)
zbian's avatar
zbian committed
129
130

```shell
131
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
132
133
134
135
136
137
138
139
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
140
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
141
142

```shell
ver217's avatar
ver217 committed
143
pip install --global-option="--no_cuda_ext" .
zbian's avatar
zbian committed
144
145
```

146
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
147

Frank Lee's avatar
Frank Lee committed
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
## Use Docker

Run the following command to build a docker image from Dockerfile provided.

```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

163
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
164
165
166
167
168

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
169
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
170

171
172
## Contributing

binmakeswell's avatar
binmakeswell committed
173
174
175
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
176

binmakeswell's avatar
binmakeswell committed
177
178
179
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
180

181
182
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
183
184
185
186
187
188
## Quick View

### Start Distributed Training in Lines

```python
import colossalai
189
190
191
192
193
194
195
196
197
198
199
200
201
from colossalai.utils import get_dataloader


# my_config can be path to config file or a dictionary obj
# 'localhost' is only for single node, you need to specify
# the node name if using multiple nodes
colossalai.launch(
    config=my_config,
    rank=rank,
    world_size=world_size,
    backend='nccl',
    port=29500,
    host='localhost'
zbian's avatar
zbian committed
202
)
203
204

# build your model
205
model = ...
206

207
# build you dataset, the dataloader will have distributed data
208
# sampler by default
209
train_dataset = ...
210
train_dataloader = get_dataloader(dataset=dataset,
211
                                shuffle=True
212
                                )
213
214


binmakeswell's avatar
binmakeswell committed
215
# build your optimizer
216
optimizer = ...
217
218
219
220

# build your loss function
criterion = ...

binmakeswell's avatar
binmakeswell committed
221
# initialize colossalai
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
engine, train_dataloader, _, _ = colossalai.initialize(
    model=model,
    optimizer=optimizer,
    criterion=criterion,
    train_dataloader=train_dataloader
)

# start training
engine.train()
for epoch in range(NUM_EPOCHS):
    for data, label in train_dataloader:
        engine.zero_grad()
        output = engine(data)
        loss = engine.criterion(output, label)
        engine.backward(loss)
        engine.step()

zbian's avatar
zbian committed
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
```

### Write a Simple 2D Parallel Model

Let's say we have a huge MLP model and its very large hidden size makes it difficult to fit into a single GPU. We can
then distribute the model weights across GPUs in a 2D mesh while you still write your model in a familiar way.

```python
from colossalai.nn import Linear2D
import torch.nn as nn


class MLP_2D(nn.Module):

    def __init__(self):
        super().__init__()
        self.linear_1 = Linear2D(in_features=1024, out_features=16384)
        self.linear_2 = Linear2D(in_features=16384, out_features=1024)

    def forward(self, x):
        x = self.linear_1(x)
        x = self.linear_2(x)
        return x

```

265
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
266

267
## Cite Us
zbian's avatar
zbian committed
268

269
270
271
272
273
274
275
276
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
277

Jie Zhu's avatar
Jie Zhu committed
278
<p align="right">(<a href="#top">back to top</a>)</p>