"googlemock/vscode:/vscode.git/clone" did not exist on "6f168c1f82cbc4a4f741486a4625bb05b89b63c6"
README.md 8.68 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
6
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

   An integrated large-scale model training system with efficient parallelization techniques.
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
31
   <a href="#Parallel-Demo">Parallel Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
38
39
   </ul>
 </li>
40
41
42
43
44
45
46
 <li>
   <a href="#Single-GPU-Demo">Single GPU Demo</a> 
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
65

binmakeswell's avatar
binmakeswell committed
66
67
68
69
70
71
72
73
74
75
76
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   Prof. James Demmel (UC Berkeley): Colossal-AI makes distributed training efficient, easy and scalable.
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
77
78
79
## Features

Colossal-AI provides a collection of parallel training components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
80
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
81
82
distributed training in a few lines.

Jiarui Fang's avatar
Jiarui Fang committed
83
84
85
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
86
87
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
Jiarui Fang's avatar
Jiarui Fang committed
88
89
90
91
92
93
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/2108.05818)

- Heterogeneous Memory Menagement 
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
94
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
95

96
97
<p align="right">(<a href="#top">back to top</a>)</p>

98
## Parallel Demo
binmakeswell's avatar
binmakeswell committed
99
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
100
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
101
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
102
</p>
binmakeswell's avatar
binmakeswell committed
103

fastalgo's avatar
fastalgo committed
104
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
105

106
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
107
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
108
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
109
</p>
binmakeswell's avatar
binmakeswell committed
110

fastalgo's avatar
fastalgo committed
111
- Save 50% GPU resources, and 10.7% acceleration
112
113

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
114
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
115

fastalgo's avatar
fastalgo committed
116
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
117

Sze-qq's avatar
Sze-qq committed
118
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
119

Sze-qq's avatar
Sze-qq committed
120
121
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
122
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
123
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
124

125
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
126

binmakeswell's avatar
binmakeswell committed
127
128
129
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
130
131
Please visit our [documentation and tutorials](https://www.colossalai.org/) for more details.

132
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
133

134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
## Single GPU Demo

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
152
153
## Installation

154
### Download From Official Releases
ver217's avatar
ver217 committed
155

156
You can visit the [Download](/download) page to download Colossal-AI with pre-built CUDA extensions.
157

ver217's avatar
ver217 committed
158

159
### Download From Source
ver217's avatar
ver217 committed
160

161
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
162
163

```shell
164
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
165
cd ColossalAI
166

zbian's avatar
zbian committed
167
168
169
170
171
172
173
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
174
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
175
176

```shell
177
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
178
179
```

180
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
181

Frank Lee's avatar
Frank Lee committed
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
## Use Docker

Run the following command to build a docker image from Dockerfile provided.

```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

197
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
198
199
200
201
202

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
203
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
204

205
206
## Contributing

binmakeswell's avatar
binmakeswell committed
207
208
209
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
210

binmakeswell's avatar
binmakeswell committed
211
212
213
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
214

215
216
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
217
218
219
220
221
## Quick View

### Start Distributed Training in Lines

```python
222
223
224
parallel = dict(
    pipeline=2,
    tensor=dict(mode='2.5d', depth = 1, size=4)
225
)
zbian's avatar
zbian committed
226
227
```

228
### Start Heterogeneous Training in Lines
zbian's avatar
zbian committed
229
230

```python
231
232
233
234
235
236
237
238
zero = dict(
    model_config=dict(
        tensor_placement_policy='auto',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=True
    ),
    optimizer_config=dict(initial_scale=2**5, gpu_margin_mem_ratio=0.2)
)
zbian's avatar
zbian committed
239
240
241

```

242
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
243

244
## Cite Us
zbian's avatar
zbian committed
245

246
247
248
249
250
251
252
253
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
254

Jie Zhu's avatar
Jie Zhu committed
255
<p align="right">(<a href="#top">back to top</a>)</p>