README-zh-Hans.md 23.8 KB
Newer Older
binmakeswell's avatar
binmakeswell committed
1
# Colossal-AI
2
<div id="top" align="center">
binmakeswell's avatar
binmakeswell committed
3

Sze-qq's avatar
Sze-qq committed
4
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/colossal-ai_logo_vertical.png)](https://www.colossalai.org/)
5

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: 让AI大模型更低成本、方便易用、高效扩展
binmakeswell's avatar
binmakeswell committed
7

8
9
   <h3> <a href="https://arxiv.org/abs/2110.14883"> 论文 </a> |
   <a href="https://www.colossalai.org/"> 文档 </a> |
10
   <a href="https://github.com/hpcaitech/ColossalAI/tree/main/examples"> 例程 </a> |
11
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> 论坛 </a> |
12
   <a href="https://medium.com/@hpcaitech"> 博客 </a></h3>
binmakeswell's avatar
binmakeswell committed
13

14
   [![GitHub Repo stars](https://img.shields.io/github/stars/hpcaitech/ColossalAI?style=social)](https://github.com/hpcaitech/ColossalAI/stargazers)
Frank Lee's avatar
Frank Lee committed
15
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml)
binmakeswell's avatar
binmakeswell committed
16
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
17
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
18
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
19
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack)
20
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

binmakeswell's avatar
binmakeswell committed
24
</div>
25

binmakeswell's avatar
binmakeswell committed
26
## 新闻
27
* [2023/09] [One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific Llm Solution](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
binmakeswell's avatar
binmakeswell committed
28
* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
29
* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
30
* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
31
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
32
* [2023/03] [Intel and Colossal-AI Partner to Deliver Cost-Efficient Open-Source Solution for Protein Folding Structure Prediction](https://www.hpc-ai.tech/blog/intel-habana)
binmakeswell's avatar
binmakeswell committed
33
* [2023/03] [AWS and Google Fund Colossal-AI with Startup Cloud Programs](https://www.hpc-ai.tech/blog/aws-and-google-fund-colossal-ai-with-startup-cloud-programs)
34
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)
binmakeswell's avatar
binmakeswell committed
35
* [2023/01] [Hardware Savings Up to 46 Times for AIGC and  Automatic Parallelism](https://medium.com/pytorch/latest-colossal-ai-boasts-novel-automatic-parallelism-and-offers-savings-up-to-46x-for-stable-1453b48f3f02)
36
37
38

## 目录
<ul>
binmakeswell's avatar
binmakeswell committed
39
 <li><a href="#为何选择-Colossal-AI">为何选择 Colossal-AI</a> </li>
40
 <li><a href="#特点">特点</a> </li>
41
42
43
 <li>
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI 成功案例</a>
   <ul>
44
     <li><a href="#Colossal-LLaMA-2">Colossal-LLaMA-2: 千元预算半天训练,效果媲美主流大模型,开源可商用中文LLaMA-2</a></li>
45
46
47
48
49
     <li><a href="#ColossalChat">ColossalChat:完整RLHF流程0门槛克隆ChatGPT</a></li>
     <li><a href="#AIGC">AIGC: 加速 Stable Diffusion</a></li>
     <li><a href="#生物医药">生物医药: 加速AlphaFold蛋白质结构预测</a></li>
   </ul>
 </li>
50
 <li>
51
   <a href="#并行训练样例展示">并行训练样例展示</a>
52
   <ul>
binmakeswell's avatar
binmakeswell committed
53
     <li><a href="#LLaMA2">LLaMA 1/2</a></li>
54
55
56
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
57
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
58
     <li><a href="#OPT">OPT</a></li>
59
     <li><a href="#ViT">ViT</a></li>
60
     <li><a href="#推荐系统模型">推荐系统模型</a></li>
61
62
   </ul>
 </li>
63
<li>
64
   <a href="#单GPU训练样例展示">单GPU训练样例展示</a>
65
66
67
68
69
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
70
<li>
71
   <a href="#推理-Energon-AI-样例展示">推理 (Energon-AI) 样例展示</a>
binmakeswell's avatar
binmakeswell committed
72
73
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
binmakeswell's avatar
binmakeswell committed
74
     <li><a href="#OPT-Serving">1750亿参数OPT在线推理服务</a></li>
binmakeswell's avatar
binmakeswell committed
75
     <li><a href="#BLOOM-Inference">1760亿参数 BLOOM</a></li>
binmakeswell's avatar
binmakeswell committed
76
77
   </ul>
 </li>
78
79
80
81
82
83
84
85
86
87
88
89
 <li>
   <a href="#安装">安装</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#从源代码安装">从源代码安装</a></li>
   </ul>
 </li>
 <li><a href="#使用-Docker">使用 Docker</a></li>
 <li><a href="#社区">社区</a></li>
 <li><a href="#做出贡献">做出贡献</a></li>
 <li><a href="#引用我们">引用我们</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
90

binmakeswell's avatar
binmakeswell committed
91
92
93
94
95
96
97
98
99
100
101
## 为何选择 Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   James Demmel 教授 (加州大学伯克利分校): Colossal-AI 让分布式训练高效、易用、可扩展。
</div>

<p align="right">(<a href="#top">返回顶端</a>)</p>

binmakeswell's avatar
binmakeswell committed
102
103
## 特点

binmakeswell's avatar
binmakeswell committed
104
Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的分布式 AI 模型像构建普通的单 GPU 模型一样简单。我们提供的友好工具可以让您在几行代码内快速开始分布式训练和推理。
binmakeswell's avatar
binmakeswell committed
105

binmakeswell's avatar
binmakeswell committed
106
107
108
109
110
- 并行化策略
  - 数据并行
  - 流水线并行
  - 1维, [2维](https://arxiv.org/abs/2104.05343), [2.5维](https://arxiv.org/abs/2105.14500), [3维](https://arxiv.org/abs/2105.14450) 张量并行
  - [序列并行](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
111
  - [零冗余优化器 (ZeRO)](https://arxiv.org/abs/1910.02054)
112
  - [自动并行](https://arxiv.org/abs/2302.02599)
binmakeswell's avatar
binmakeswell committed
113
114
115
116
- 异构内存管理
  - [PatrickStar](https://arxiv.org/abs/2108.05818)
- 使用友好
  - 基于参数文件的并行化
binmakeswell's avatar
binmakeswell committed
117
118
- 推理
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)
119

120
121
<p align="right">(<a href="#top">返回顶端</a>)</p>

122
## Colossal-AI 成功案例
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
### Colossal-LLaMA-2

- 千元预算半天训练,效果媲美主流大模型,开源可商用中文LLaMA-2
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[博客]](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
[[模型权重]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-7b-base)

|                                |  Backbone  | Tokens Consumed |  |         MMLU         |     CMMLU     | AGIEval | GAOKAO | CEval  |
| :----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :-----: | :----: | :----: | :------------------------------: |
|                                |           |        -        |                |        5-shot        |    5-shot     | 5-shot  | 0-shot | 5-shot |
|          Baichuan-7B           |     -      |      1.2T       |             |    42.32 (42.30)     | 44.53 (44.02) |  38.72  | 36.74  | 42.80  |
|       Baichuan-13B-Base        |     -      |      1.4T       |             |    50.51 (51.60)     | 55.73 (55.30) |  47.20  | 51.41  | 53.60  |
|       Baichuan2-7B-Base        |     -      |      2.6T       |             |    46.97 (54.16)     | 57.67 (57.07) |  45.76  | 52.60  | 54.00  |
|       Baichuan2-13B-Base       |     -      |      2.6T       |             |    54.84 (59.17)     | 62.62 (61.97) |  52.08  | 58.25  | 58.10  |
|           ChatGLM-6B           |     -      |      1.0T       |             |    39.67 (40.63)     |   41.17 (-)   |  40.10  | 36.53  | 38.90  |
|          ChatGLM2-6B           |     -      |      1.4T       |             |    44.74 (45.46)     |   49.40 (-)   |  46.36  | 45.49  | 51.70  |
|          InternLM-7B           |     -      |      1.6T       |                |    46.70 (51.00)     |   52.00 (-)   |  44.77  | 61.64  | 52.80  |
|            Qwen-7B             |     -      |      2.2T       |             | 54.29 (56.70) | 56.03 (58.80) |  52.47  | 56.42  | 59.60  |
|                                |            |                 |                 |                      |               |         |        |        |
|           Llama-2-7B           |     -      |      2.0T       |             |    44.47 (45.30)     |   32.97 (-)   |  32.60  | 25.46  |   -    |
| Linly-AI/Chinese-LLaMA-2-7B-hf | Llama-2-7B |      1.0T       |             |        37.43         |     29.92     |  32.00  | 27.57  |   -    |
| wenge-research/yayi-7b-llama2  | Llama-2-7B |        -        |                |        38.56         |     31.52     |  30.99  | 25.95  |   -    |
| ziqingyang/chinese-llama-2-7b  | Llama-2-7B |        -        |                |        33.86         |     34.69     |  34.52  | 25.18  |  34.2  |
| TigerResearch/tigerbot-7b-base | Llama-2-7B |      0.3T       |             |        43.73         |     42.04     |  37.64  | 30.61  |   -    |
|  LinkSoul/Chinese-Llama-2-7b   | Llama-2-7B |        -        |                |        48.41         |     38.31     |  38.45  | 27.72  |   -    |
|       FlagAlpha/Atom-7B        | Llama-2-7B |      0.1T       |             |        49.96         |     41.10     |  39.83  | 33.00  |   -    |
| IDEA-CCNL/Ziya-LLaMA-13B-v1.1  | Llama-13B  |      0.11T      |            |        50.25         |     40.99     |  40.04  | 30.54  |   -    |
|  |  |  |  |  |  |  |  |  |
|    **Colossal-LLaMA-2-7b-base**    | Llama-2-7B |      **0.0085T**      |            |        53.06         |     49.89     |  51.48  | 58.82  |  50.2  |


154
155
156
### ColossalChat

<div align="center">
157
158
   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
159
160
161
   </a>
</div>

162
163
164
[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/)
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
[[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
165
166
167
168
169
170
171
172
[[在线样例]](https://www.youtube.com/watch?v=HcTiHzApHm0)
[[教程]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

<p id="ColossalChat-Speed" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>

- 最高可提升RLHF PPO阶段3训练速度10倍
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201

<p id="ColossalChat_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>

- 最高可提升单机训练速度7.73倍,单卡推理速度1.42倍

<p id="ColossalChat-1GPU" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
</p>

- 单卡模型容量最多提升10.3倍
- 最小demo训练流程最低仅需1.62GB显存 (任意消费级GPU)

<p id="ColossalChat-LoRA" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
</p>

- 提升单卡的微调模型容量3.7倍
- 同时保持高速运行

<p align="right">(<a href="#top">back to top</a>)</p>

### AIGC
加速AIGC(AI内容生成)模型,如[Stable Diffusion v1](https://github.com/CompVis/stable-diffusion)[Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)

<p id="diffusion_train" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width=800/>
</p>
202

203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
- [训练](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): 减少5.6倍显存消耗,硬件成本最高降低46倍(从A100到RTX3060)

<p id="diffusion_demo" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width=800/>
</p>

- [DreamBooth微调](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): 仅需3-5张目标主题图像个性化微调

<p id="inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width=800/>
</p>

- [推理](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): GPU推理显存消耗降低2.5倍


<p align="right">(<a href="#top">返回顶端</a>)</p>

### 生物医药

加速 [AlphaFold](https://alphafold.ebi.ac.uk/) 蛋白质结构预测

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): 加速AlphaFold训练与推理、数据前处理、推理序列长度超过10000残基

<p id="FastFold-Intel" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/data%20preprocessing%20with%20Intel.jpg" width=600/>
</p>

- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3倍推理加速和39%成本节省

<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): 11倍加速蛋白质单体与复合物结构预测

<p align="right">(<a href="#top">返回顶端</a>)</p>

## 并行训练样例展示
binmakeswell's avatar
binmakeswell committed
245
246
247
248
249
250
### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 700亿参数LLaMA2训练加速195%
251
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
binmakeswell's avatar
binmakeswell committed
252
253
254
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA1
255
256
257
258
259
260
261
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>

- 650亿参数大模型预训练加速38%
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[博客]](https://www.hpc-ai.tech/blog/large-model-pretraining)
binmakeswell's avatar
binmakeswell committed
262

263
### GPT-3
binmakeswell's avatar
binmakeswell committed
264
<p align="center">
Sze-qq's avatar
Sze-qq committed
265
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
binmakeswell's avatar
binmakeswell committed
266
</p>
binmakeswell's avatar
binmakeswell committed
267

268
- 释放 50% GPU 资源占用, 或 10.7% 加速
269
270

### GPT-2
271
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
272

binmakeswell's avatar
binmakeswell committed
273
- 降低11倍 GPU 显存占用,或超线性扩展(张量并行)
274

Sze-qq's avatar
Sze-qq committed
275
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
276

binmakeswell's avatar
binmakeswell committed
277
- 用相同的硬件训练24倍大的模型
278
- 超3倍的吞吐量
binmakeswell's avatar
binmakeswell committed
279
280

### BERT
281
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
282

283
- 2倍训练速度,或1.5倍序列长度
binmakeswell's avatar
binmakeswell committed
284

binmakeswell's avatar
binmakeswell committed
285
286
287
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): 可扩展的谷歌 Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)) 实现。

binmakeswell's avatar
binmakeswell committed
288
### OPT
289
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>
binmakeswell's avatar
binmakeswell committed
290
291

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), 由Meta发布的1750亿语言模型,由于完全公开了预训练参数权重,因此促进了下游任务和应用部署的发展。
292
- 加速45%,仅用几行代码以低成本微调OPT。[[样例]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/opt) [[在线推理]](https://colossalai.org/docs/advanced_tutorials/opt_service)
binmakeswell's avatar
binmakeswell committed
293

294
请访问我们的 [文档](https://www.colossalai.org/)[例程](https://github.com/hpcaitech/ColossalAI/tree/main/examples) 以了解详情。
binmakeswell's avatar
binmakeswell committed
295

296
297
298
299
300
301
### ViT
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
</p>

- 14倍批大小和5倍训练速度(张量并行=64)
302
303

### 推荐系统模型
304
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), 使用软件Cache实现Embeddings,用更少GPU显存训练更大的模型。
305
306


307
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
308

binmakeswell's avatar
binmakeswell committed
309
## 单GPU训练样例展示
binmakeswell's avatar
binmakeswell committed
310

311
312
313
314
### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>
binmakeswell's avatar
binmakeswell committed
315

binmakeswell's avatar
binmakeswell committed
316
- 用相同的硬件训练20倍大的模型
binmakeswell's avatar
binmakeswell committed
317

Jiarui Fang's avatar
Jiarui Fang committed
318
319
320
321
322
323
<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 用相同的硬件训练120倍大的模型 (RTX 3080)

324
325
326
327
### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>
binmakeswell's avatar
binmakeswell committed
328

binmakeswell's avatar
binmakeswell committed
329
330
- 用相同的硬件训练34倍大的模型

binmakeswell's avatar
binmakeswell committed
331
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
332
333


binmakeswell's avatar
binmakeswell committed
334
## 推理 (Energon-AI) 样例展示
binmakeswell's avatar
binmakeswell committed
335
336
337
338
339
340

<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI) :用相同的硬件推理加速50%
341

342
343
344
345
<p id="OPT-Serving" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
</p>

346
- [OPT推理服务](https://colossalai.org/docs/advanced_tutorials/opt_service): 体验1750亿参数OPT在线推理服务
binmakeswell's avatar
binmakeswell committed
347

348
349
350
351
<p id="BLOOM-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
</p>

binmakeswell's avatar
binmakeswell committed
352
- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): 降低1760亿参数BLOOM模型部署推理成本超10倍
binmakeswell's avatar
binmakeswell committed
353

binmakeswell's avatar
binmakeswell committed
354
<p align="right">(<a href="#top">返回顶端</a>)</p>
355
356

## 安装
357
358
359
360
361
362

环境要求:

- PyTorch >= 1.11 (PyTorch 2.x 正在适配中)
- Python >= 3.7
- CUDA >= 11.0
363
364
- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
- Linux OS
365

366
如果你遇到安装问题,可以向本项目 [反馈](https://github.com/hpcaitech/ColossalAI/issues/new/choose)
binmakeswell's avatar
binmakeswell committed
367

368

369
370
### 从PyPI安装

371
您可以用下面的命令直接从PyPI上下载并安装Colossal-AI。我们默认不会安装PyTorch扩展包。
372
373
374
375
376

```bash
pip install colossalai
```

377
378
**注:目前只支持Linux。**

379
380
381
382
383
384
385
386
387
388
389
390
391
392
但是,如果你想在安装时就直接构建PyTorch扩展,您可以设置环境变量`CUDA_EXT=1`.

```bash
CUDA_EXT=1 pip install colossalai
```

**否则,PyTorch扩展只会在你实际需要使用他们时在运行时里被构建。**

与此同时,我们也每周定时发布Nightly版本,这能让你提前体验到新的feature和bug fix。你可以通过以下命令安装Nightly版本。

```bash
pip install colossalai-nightly
```

393
### 从源码安装
394
395

> 此文档将与版本库的主分支保持一致。如果您遇到任何问题,欢迎给我们提 issue :)
binmakeswell's avatar
binmakeswell committed
396
397
398
399

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
400
401

# install dependency
binmakeswell's avatar
binmakeswell committed
402
403
pip install -r requirements/requirements.txt

404
# install colossalai
binmakeswell's avatar
binmakeswell committed
405
406
407
pip install .
```

408
我们默认在`pip install`时不安装PyTorch扩展,而是在运行时临时编译,如果你想要提前安装这些扩展的话(在使用融合优化器时会用到),可以使用一下命令。
binmakeswell's avatar
binmakeswell committed
409
410

```shell
411
CUDA_EXT=1 pip install .
binmakeswell's avatar
binmakeswell committed
412
413
```

414
415
<p align="right">(<a href="#top">返回顶端</a>)</p>

binmakeswell's avatar
binmakeswell committed
416
417
## 使用 Docker

418
419
420
421
422
423
### 从DockerHub获取镜像

您可以直接从我们的[DockerHub主页](https://hub.docker.com/r/hpcaitech/colossalai)获取最新的镜像,每一次发布我们都会自动上传最新的镜像。

### 本地构建镜像

binmakeswell's avatar
binmakeswell committed
424
运行以下命令从我们提供的 docker 文件中建立 docker 镜像。
binmakeswell's avatar
binmakeswell committed
425

426
427
428
> 在Dockerfile里编译Colossal-AI需要有GPU支持,您需要将Nvidia Docker Runtime设置为默认的Runtime。更多信息可以点击[这里](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime)。
> 我们推荐从[项目主页](https://www.colossalai.org)直接下载Colossal-AI.

binmakeswell's avatar
binmakeswell committed
429
430
431
432
433
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

binmakeswell's avatar
binmakeswell committed
434
运行以下命令从以交互式启动 docker 镜像.
binmakeswell's avatar
binmakeswell committed
435
436
437
438
439

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

440
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
441
442
443
444

## 社区
欢迎通过[论坛](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
binmakeswell's avatar
binmakeswell committed
445
[微信](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode")加入 Colossal-AI 社区,与我们分享你的建议和问题。
binmakeswell's avatar
binmakeswell committed
446
447


binmakeswell's avatar
binmakeswell committed
448
449
## 做出贡献

450
451
452
453
454
455
456
参考社区的成功案例,如 [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion) 等,
无论是个人开发者,还是算力、数据、模型等可能合作方,都欢迎参与参与共建 Colossal-AI 社区,拥抱大模型时代!

您可通过以下方式联系或参与:
1. [留下Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) 展现你的喜爱和支持,非常感谢!
2. 发布 [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), 或者在GitHub根据[贡献指南](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md) 提交一个 PR。
3. 发送你的正式合作提案到 contact@hpcaitech.com
binmakeswell's avatar
binmakeswell committed
457

binmakeswell's avatar
binmakeswell committed
458
459
真诚感谢所有贡献者!

460
461
462
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=hpcaitech/ColossalAI"  width="800px"/>
</a>
binmakeswell's avatar
binmakeswell committed
463

464
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
465
466


467
468
## CI/CD

469
我们使用[GitHub Actions](https://github.com/features/actions)来自动化大部分开发以及部署流程。如果想了解这些工作流是如何运行的,请查看这个[文档](https://github.com/hpcaitech/ColossalAI/blob/main/.github/workflows/README.md).
470
471


472
## 引用我们
binmakeswell's avatar
binmakeswell committed
473

474
475
476
477
Colossal-AI项目受一些相关的项目启发而成立,一些项目是我们的开发者的科研项目,另一些来自于其他组织的科研工作。我们希望. 我们希望在[参考文献列表](./REFERENCE.md)中列出这些令人称赞的项目,以向开源社区和研究项目致谢。

你可以通过以下格式引用这个项目。

binmakeswell's avatar
binmakeswell committed
478
479
480
481
482
483
484
485
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
486

487
Colossal-AI 已被[NeurIPS](https://nips.cc/), [SC](https://sc22.supercomputing.org/), [AAAI](https://aaai.org/Conferences/AAAI-23/),
488
[PPoPP](https://ppopp23.sigplan.org/), [CVPR](https://cvpr2023.thecvf.com/), [ISC](https://www.isc-hpc.com/), [NVIDIA GTC](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-S51482/) ,等顶级会议录取为官方教程。
489

binmakeswell's avatar
binmakeswell committed
490
<p align="right">(<a href="#top">返回顶端</a>)</p>