"deploy/examples/llm/disaggregated/routerless_deployment.py" did not exist on "03b0101e4d4013874e33f8144c9793567e762c9f"
README.md 10.5 KB
Newer Older
1
# Colossal-AI
2
<div id="top" align="center">
3

4
5
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Colossal-AI_logo.png)](https://www.colossalai.org/)

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: A Unified Deep Learning System for Big Model Era
7

8
9
10
11
   <h3> <a href="https://arxiv.org/abs/2110.14883"> Paper </a> | 
   <a href="https://www.colossalai.org/"> Documentation </a> | 
   <a href="https://github.com/hpcaitech/ColossalAI-Examples"> Examples </a> |   
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> Forum </a> | 
12
   <a href="https://medium.com/@hpcaitech"> Blog </a></h3>
13

Frank Lee's avatar
Frank Lee committed
14
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build.yml)
15
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
16
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
17
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
18
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w)
19
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
Frank Lee's avatar
Frank Lee committed
20
   
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

24
</div>
ver217's avatar
ver217 committed
25

26
27
## Table of Contents
<ul>
binmakeswell's avatar
binmakeswell committed
28
 <li><a href="#Why-Colossal-AI">Why Colossal-AI</a> </li>
29
30
 <li><a href="#Features">Features</a> </li>
 <li>
binmakeswell's avatar
binmakeswell committed
31
   <a href="#Parallel-Training-Demo">Parallel Training Demo</a> 
32
33
34
35
36
   <ul>
     <li><a href="#ViT">ViT</a></li>
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
37
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
38
     <li><a href="#OPT">OPT</a></li>
39
40
   </ul>
 </li>
41
 <li>
binmakeswell's avatar
binmakeswell committed
42
   <a href="#Single-GPU-Training-Demo">Single GPU Training Demo</a> 
43
44
45
46
47
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
48
 <li>
binmakeswell's avatar
binmakeswell committed
49
   <a href="#Inference-Energon-AI-Demo">Inference (Energon-AI) Demo</a> 
binmakeswell's avatar
binmakeswell committed
50
51
52
53
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
   </ul>
 </li>
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
 <li>
   <a href="#Installation">Installation</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#Install-From-Source">Install From Source</a></li>
   </ul>
 </li>
 <li><a href="#Use-Docker">Use Docker</a></li>
 <li><a href="#Community">Community</a></li>
 <li><a href="#contributing">Contributing</a></li>
 <li><a href="#Quick-View">Quick View</a></li>
   <ul>
     <li><a href="#Start-Distributed-Training-in-Lines">Start Distributed Training in Lines</a></li>
     <li><a href="#Write-a-Simple-2D-Parallel-Model">Write a Simple 2D Parallel Model</a></li>
   </ul>
 <li><a href="#Cite-Us">Cite Us</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
71

binmakeswell's avatar
binmakeswell committed
72
73
74
75
76
77
## Why Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

fastalgo's avatar
fastalgo committed
78
   Prof. James Demmel (UC Berkeley): Colossal-AI makes training AI models efficient, easy, and scalable.
binmakeswell's avatar
binmakeswell committed
79
80
81
82
</div>

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
83
84
## Features

binmakeswell's avatar
binmakeswell committed
85
Colossal-AI provides a collection of parallel components for you. We aim to support you to write your
fastalgo's avatar
fastalgo committed
86
distributed deep learning models just like how you write your model on your laptop. We provide user-friendly tools to kickstart
binmakeswell's avatar
binmakeswell committed
87
distributed training and inference in a few lines.
binmakeswell's avatar
binmakeswell committed
88

Jiarui Fang's avatar
Jiarui Fang committed
89
90
91
- Parallelism strategies
  - Data Parallelism
  - Pipeline Parallelism
binmakeswell's avatar
binmakeswell committed
92
93
  - 1D, [2D](https://arxiv.org/abs/2104.05343), [2.5D](https://arxiv.org/abs/2105.14500), [3D](https://arxiv.org/abs/2105.14450) Tensor Parallelism
  - [Sequence Parallelism](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
94
  - [Zero Redundancy Optimizer (ZeRO)](https://arxiv.org/abs/1910.02054)
Jiarui Fang's avatar
Jiarui Fang committed
95

fastalgo's avatar
fastalgo committed
96
- Heterogeneous Memory Management 
Jiarui Fang's avatar
Jiarui Fang committed
97
98
99
  - [PatrickStar](https://arxiv.org/abs/2108.05818)

- Friendly Usage
binmakeswell's avatar
binmakeswell committed
100
  - Parallelism based on configuration file
binmakeswell's avatar
binmakeswell committed
101

binmakeswell's avatar
binmakeswell committed
102
103
104
- Inference
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)

105
106
<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
107
## Parallel Training Demo
binmakeswell's avatar
binmakeswell committed
108
### ViT
Jiarui Fang's avatar
Jiarui Fang committed
109
<p align="center">
Shen Chenhui's avatar
Shen Chenhui committed
110
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
Jiarui Fang's avatar
Jiarui Fang committed
111
</p>
binmakeswell's avatar
binmakeswell committed
112

fastalgo's avatar
fastalgo committed
113
- 14x larger batch size, and 5x faster training for Tensor Parallelism = 64
binmakeswell's avatar
binmakeswell committed
114

115
### GPT-3
Jiarui Fang's avatar
Jiarui Fang committed
116
<p align="center">
Sze-qq's avatar
Sze-qq committed
117
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
Jiarui Fang's avatar
Jiarui Fang committed
118
</p>
binmakeswell's avatar
binmakeswell committed
119

fastalgo's avatar
fastalgo committed
120
- Save 50% GPU resources, and 10.7% acceleration
121
122

### GPT-2
Shen Chenhui's avatar
Shen Chenhui committed
123
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
124

fastalgo's avatar
fastalgo committed
125
- 11x lower GPU memory consumption, and superlinear scaling efficiency with Tensor Parallelism
126

Sze-qq's avatar
Sze-qq committed
127
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
128

Sze-qq's avatar
Sze-qq committed
129
130
- 24x larger model size on the same hardware
- over 3x acceleration
binmakeswell's avatar
binmakeswell committed
131
### BERT
Shen Chenhui's avatar
Shen Chenhui committed
132
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
133

134
- 2x faster training, or 50% longer sequence length
binmakeswell's avatar
binmakeswell committed
135

binmakeswell's avatar
binmakeswell committed
136
137
138
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): Scalable implementation of Google's Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)).

binmakeswell's avatar
binmakeswell committed
139
140
141
142
143
144
145
### OPT
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT.png" width=800/>

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), a 175-Billion parameter AI language model released by Meta, which stimulates AI programmers to perform various downstream tasks and application deployments because public pretrained model weights.
- 40% speedup fine-tuning OPT at low cost in lines. [[Example]](https://github.com/hpcaitech/ColossalAI-Examples/tree/main/language/opt)

Please visit our [documentation](https://www.colossalai.org/) and [examples](https://github.com/hpcaitech/ColossalAI-Examples) for more details.
binmakeswell's avatar
binmakeswell committed
146

147
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
148

binmakeswell's avatar
binmakeswell committed
149
## Single GPU Training Demo
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166

### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>

- 20x larger model size on the same hardware

### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>

- 34x larger model size on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

binmakeswell's avatar
binmakeswell committed
167

binmakeswell's avatar
binmakeswell committed
168
## Inference (Energon-AI) Demo
binmakeswell's avatar
binmakeswell committed
169
170
171
172
173
174
175
176
177
178

### GPT-3
<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI): 50% inference acceleration on the same hardware

<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
179
180
## Installation

181
### Download From Official Releases
ver217's avatar
ver217 committed
182

binmakeswell's avatar
binmakeswell committed
183
You can visit the [Download](https://www.colossalai.org/download) page to download Colossal-AI with pre-built CUDA extensions.
184

ver217's avatar
ver217 committed
185

186
### Download From Source
ver217's avatar
ver217 committed
187

188
> The version of Colossal-AI will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
zbian's avatar
zbian committed
189
190

```shell
191
git clone https://github.com/hpcaitech/ColossalAI.git
zbian's avatar
zbian committed
192
cd ColossalAI
193

zbian's avatar
zbian committed
194
195
196
197
198
199
200
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .
```

ver217's avatar
ver217 committed
201
If you don't want to install and enable CUDA kernel fusion (compulsory installation when using fused optimizer):
zbian's avatar
zbian committed
202
203

```shell
204
NO_CUDA_EXT=1 pip install .
zbian's avatar
zbian committed
205
206
```

207
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
208

Frank Lee's avatar
Frank Lee committed
209
210
## Use Docker

211
212
213
214
215
216
217
### Pull from DockerHub

You can directly pull the docker image from our [DockerHub page](https://hub.docker.com/r/hpcaitech/colossalai). The image is automatically uploaded upon release.


### Build On Your Own

Frank Lee's avatar
Frank Lee committed
218
219
Run the following command to build a docker image from Dockerfile provided.

220
221
222
> Building Colossal-AI from scratch requires GPU support, you need to use Nvidia Docker Runtime as the default when doing `docker build`. More details can be found [here](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime).
> We recommend you install Colossal-AI from our [project page](https://www.colossalai.org) directly.

223

Frank Lee's avatar
Frank Lee committed
224
225
226
227
228
229
230
231
232
233
234
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

Run the following command to start the docker container in interactive mode.

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

235
<p align="right">(<a href="#top">back to top</a>)</p>
binmakeswell's avatar
binmakeswell committed
236
237
238
239
240

## Community

Join the Colossal-AI community on [Forum](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
fastalgo's avatar
fastalgo committed
241
and [WeChat](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your suggestions, feedback, and questions with our engineering team.
binmakeswell's avatar
binmakeswell committed
242

243
244
## Contributing

binmakeswell's avatar
binmakeswell committed
245
246
247
If you wish to contribute to this project, please follow the guideline in [Contributing](./CONTRIBUTING.md).

Thanks so much to all of our amazing contributors!
248

binmakeswell's avatar
binmakeswell committed
249
250
251
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors"><img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/contributor_avatar.png" width="800px"></a>

*The order of contributor avatars is randomly shuffled.*
252

253
254
<p align="right">(<a href="#top">back to top</a>)</p>

zbian's avatar
zbian committed
255
256
257
258
259
## Quick View

### Start Distributed Training in Lines

```python
260
261
262
parallel = dict(
    pipeline=2,
    tensor=dict(mode='2.5d', depth = 1, size=4)
263
)
zbian's avatar
zbian committed
264
265
```

266
### Start Heterogeneous Training in Lines
zbian's avatar
zbian committed
267
268

```python
269
270
271
272
273
274
275
276
zero = dict(
    model_config=dict(
        tensor_placement_policy='auto',
        shard_strategy=TensorShardStrategy(),
        reuse_fp16_shard=True
    ),
    optimizer_config=dict(initial_scale=2**5, gpu_margin_mem_ratio=0.2)
)
zbian's avatar
zbian committed
277
278
279

```

280
<p align="right">(<a href="#top">back to top</a>)</p>
zbian's avatar
zbian committed
281

282
## Cite Us
zbian's avatar
zbian committed
283

284
285
286
287
288
289
290
291
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
292

fastalgo's avatar
fastalgo committed
293
<p align="right">(<a href="#top">back to top</a>)</p>