README-zh-Hans.md 24.4 KB
Newer Older
binmakeswell's avatar
binmakeswell committed
1
# Colossal-AI
2
<div id="top" align="center">
binmakeswell's avatar
binmakeswell committed
3

Sze-qq's avatar
Sze-qq committed
4
   [![logo](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/colossal-ai_logo_vertical.png)](https://www.colossalai.org/)
5

binmakeswell's avatar
binmakeswell committed
6
   Colossal-AI: 让AI大模型更低成本、方便易用、高效扩展
binmakeswell's avatar
binmakeswell committed
7

8
9
   <h3> <a href="https://arxiv.org/abs/2110.14883"> 论文 </a> |
   <a href="https://www.colossalai.org/"> 文档 </a> |
10
   <a href="https://github.com/hpcaitech/ColossalAI/tree/main/examples"> 例程 </a> |
11
   <a href="https://github.com/hpcaitech/ColossalAI/discussions"> 论坛 </a> |
12
   <a href="https://medium.com/@hpcaitech"> 博客 </a></h3>
binmakeswell's avatar
binmakeswell committed
13

14
   [![GitHub Repo stars](https://img.shields.io/github/stars/hpcaitech/ColossalAI?style=social)](https://github.com/hpcaitech/ColossalAI/stargazers)
Frank Lee's avatar
Frank Lee committed
15
   [![Build](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml/badge.svg)](https://github.com/hpcaitech/ColossalAI/actions/workflows/build_on_schedule.yml)
binmakeswell's avatar
binmakeswell committed
16
   [![Documentation](https://readthedocs.org/projects/colossalai/badge/?version=latest)](https://colossalai.readthedocs.io/en/latest/?badge=latest)
17
   [![CodeFactor](https://www.codefactor.io/repository/github/hpcaitech/colossalai/badge)](https://www.codefactor.io/repository/github/hpcaitech/colossalai)
Frank Lee's avatar
Frank Lee committed
18
   [![HuggingFace badge](https://img.shields.io/badge/%F0%9F%A4%97HuggingFace-Join-yellow)](https://huggingface.co/hpcai-tech)
binmakeswell's avatar
binmakeswell committed
19
   [![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://github.com/hpcaitech/public_assets/tree/main/colossalai/contact/slack)
20
   [![WeChat badge](https://img.shields.io/badge/微信-加入-green?logo=wechat&amp)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png)
binmakeswell's avatar
binmakeswell committed
21
22

   | [English](README.md) | [中文](README-zh-Hans.md) |
23

binmakeswell's avatar
binmakeswell committed
24
</div>
25

binmakeswell's avatar
binmakeswell committed
26
## 新闻
binmakeswell's avatar
binmakeswell committed
27
28
* [2023/11] [Enhanced MoE Parallelism, Open-source MoE Model Training Can Be 9 Times More Efficient](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)
* [2023/09] [One Half-Day of Training Using a Few Hundred Dollars Yields Similar Results to Mainstream Large Models, Open-Source and Commercial-Free Domain-Specific LLM Solution](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
binmakeswell's avatar
binmakeswell committed
29
* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training)
30
* [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth)
31
* [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining)
32
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
33
* [2023/03] [Intel and Colossal-AI Partner to Deliver Cost-Efficient Open-Source Solution for Protein Folding Structure Prediction](https://www.hpc-ai.tech/blog/intel-habana)
binmakeswell's avatar
binmakeswell committed
34
* [2023/03] [AWS and Google Fund Colossal-AI with Startup Cloud Programs](https://www.hpc-ai.tech/blog/aws-and-google-fund-colossal-ai-with-startup-cloud-programs)
35
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)
binmakeswell's avatar
binmakeswell committed
36
* [2023/01] [Hardware Savings Up to 46 Times for AIGC and  Automatic Parallelism](https://medium.com/pytorch/latest-colossal-ai-boasts-novel-automatic-parallelism-and-offers-savings-up-to-46x-for-stable-1453b48f3f02)
37
38
39

## 目录
<ul>
binmakeswell's avatar
binmakeswell committed
40
 <li><a href="#为何选择-Colossal-AI">为何选择 Colossal-AI</a> </li>
41
 <li><a href="#特点">特点</a> </li>
42
43
44
 <li>
   <a href="#Colossal-AI-in-the-Real-World">Colossal-AI 成功案例</a>
   <ul>
45
     <li><a href="#Colossal-LLaMA-2">Colossal-LLaMA-2: 千元预算半天训练,效果媲美主流大模型,开源可商用中文LLaMA-2</a></li>
46
47
48
49
50
     <li><a href="#ColossalChat">ColossalChat:完整RLHF流程0门槛克隆ChatGPT</a></li>
     <li><a href="#AIGC">AIGC: 加速 Stable Diffusion</a></li>
     <li><a href="#生物医药">生物医药: 加速AlphaFold蛋白质结构预测</a></li>
   </ul>
 </li>
51
 <li>
52
   <a href="#并行训练样例展示">并行训练样例展示</a>
53
   <ul>
binmakeswell's avatar
binmakeswell committed
54
     <li><a href="#LLaMA2">LLaMA 1/2</a></li>
binmakeswell's avatar
binmakeswell committed
55
     <li><a href="#MoE">MoE</a></li>
56
57
58
     <li><a href="#GPT-3">GPT-3</a></li>
     <li><a href="#GPT-2">GPT-2</a></li>
     <li><a href="#BERT">BERT</a></li>
binmakeswell's avatar
binmakeswell committed
59
     <li><a href="#PaLM">PaLM</a></li>
binmakeswell's avatar
binmakeswell committed
60
     <li><a href="#OPT">OPT</a></li>
61
     <li><a href="#ViT">ViT</a></li>
62
     <li><a href="#推荐系统模型">推荐系统模型</a></li>
63
64
   </ul>
 </li>
65
<li>
66
   <a href="#单GPU训练样例展示">单GPU训练样例展示</a>
67
68
69
70
71
   <ul>
     <li><a href="#GPT-2-Single">GPT-2</a></li>
     <li><a href="#PaLM-Single">PaLM</a></li>
   </ul>
 </li>
binmakeswell's avatar
binmakeswell committed
72
<li>
73
   <a href="#推理-Energon-AI-样例展示">推理 (Energon-AI) 样例展示</a>
binmakeswell's avatar
binmakeswell committed
74
75
   <ul>
     <li><a href="#GPT-3-Inference">GPT-3</a></li>
binmakeswell's avatar
binmakeswell committed
76
     <li><a href="#OPT-Serving">1750亿参数OPT在线推理服务</a></li>
binmakeswell's avatar
binmakeswell committed
77
     <li><a href="#BLOOM-Inference">1760亿参数 BLOOM</a></li>
binmakeswell's avatar
binmakeswell committed
78
79
   </ul>
 </li>
80
81
82
83
84
85
86
87
88
89
90
91
 <li>
   <a href="#安装">安装</a>
   <ul>
     <li><a href="#PyPI">PyPI</a></li>
     <li><a href="#从源代码安装">从源代码安装</a></li>
   </ul>
 </li>
 <li><a href="#使用-Docker">使用 Docker</a></li>
 <li><a href="#社区">社区</a></li>
 <li><a href="#做出贡献">做出贡献</a></li>
 <li><a href="#引用我们">引用我们</a></li>
</ul>
binmakeswell's avatar
binmakeswell committed
92

binmakeswell's avatar
binmakeswell committed
93
94
95
96
97
98
99
100
101
102
103
## 为何选择 Colossal-AI
<div align="center">
   <a href="https://youtu.be/KnXSfjqkKN0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/JamesDemmel_Colossal-AI.png" width="600" />
   </a>

   James Demmel 教授 (加州大学伯克利分校): Colossal-AI 让分布式训练高效、易用、可扩展。
</div>

<p align="right">(<a href="#top">返回顶端</a>)</p>

binmakeswell's avatar
binmakeswell committed
104
105
## 特点

binmakeswell's avatar
binmakeswell committed
106
Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的分布式 AI 模型像构建普通的单 GPU 模型一样简单。我们提供的友好工具可以让您在几行代码内快速开始分布式训练和推理。
binmakeswell's avatar
binmakeswell committed
107

binmakeswell's avatar
binmakeswell committed
108
109
110
111
112
- 并行化策略
  - 数据并行
  - 流水线并行
  - 1维, [2维](https://arxiv.org/abs/2104.05343), [2.5维](https://arxiv.org/abs/2105.14500), [3维](https://arxiv.org/abs/2105.14450) 张量并行
  - [序列并行](https://arxiv.org/abs/2105.13120)
binmakeswell's avatar
binmakeswell committed
113
  - [零冗余优化器 (ZeRO)](https://arxiv.org/abs/1910.02054)
114
  - [自动并行](https://arxiv.org/abs/2302.02599)
binmakeswell's avatar
binmakeswell committed
115
116
117
118
- 异构内存管理
  - [PatrickStar](https://arxiv.org/abs/2108.05818)
- 使用友好
  - 基于参数文件的并行化
binmakeswell's avatar
binmakeswell committed
119
120
- 推理
  - [Energon-AI](https://github.com/hpcaitech/EnergonAI)
121

122
123
<p align="right">(<a href="#top">返回顶端</a>)</p>

124
## Colossal-AI 成功案例
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
### Colossal-LLaMA-2

- 千元预算半天训练,效果媲美主流大模型,开源可商用中文LLaMA-2
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2)
[[博客]](https://www.hpc-ai.tech/blog/one-half-day-of-training-using-a-few-hundred-dollars-yields-similar-results-to-mainstream-large-models-open-source-and-commercial-free-domain-specific-llm-solution)
[[模型权重]](https://huggingface.co/hpcai-tech/Colossal-LLaMA-2-7b-base)

|                                |  Backbone  | Tokens Consumed |  |         MMLU         |     CMMLU     | AGIEval | GAOKAO | CEval  |
| :----------------------------: | :--------: | :-------------: | :------------------: | :-----------: | :-----: | :----: | :----: | :------------------------------: |
|                                |           |        -        |                |        5-shot        |    5-shot     | 5-shot  | 0-shot | 5-shot |
|          Baichuan-7B           |     -      |      1.2T       |             |    42.32 (42.30)     | 44.53 (44.02) |  38.72  | 36.74  | 42.80  |
|       Baichuan-13B-Base        |     -      |      1.4T       |             |    50.51 (51.60)     | 55.73 (55.30) |  47.20  | 51.41  | 53.60  |
|       Baichuan2-7B-Base        |     -      |      2.6T       |             |    46.97 (54.16)     | 57.67 (57.07) |  45.76  | 52.60  | 54.00  |
|       Baichuan2-13B-Base       |     -      |      2.6T       |             |    54.84 (59.17)     | 62.62 (61.97) |  52.08  | 58.25  | 58.10  |
|           ChatGLM-6B           |     -      |      1.0T       |             |    39.67 (40.63)     |   41.17 (-)   |  40.10  | 36.53  | 38.90  |
|          ChatGLM2-6B           |     -      |      1.4T       |             |    44.74 (45.46)     |   49.40 (-)   |  46.36  | 45.49  | 51.70  |
|          InternLM-7B           |     -      |      1.6T       |                |    46.70 (51.00)     |   52.00 (-)   |  44.77  | 61.64  | 52.80  |
|            Qwen-7B             |     -      |      2.2T       |             | 54.29 (56.70) | 56.03 (58.80) |  52.47  | 56.42  | 59.60  |
|                                |            |                 |                 |                      |               |         |        |        |
|           Llama-2-7B           |     -      |      2.0T       |             |    44.47 (45.30)     |   32.97 (-)   |  32.60  | 25.46  |   -    |
| Linly-AI/Chinese-LLaMA-2-7B-hf | Llama-2-7B |      1.0T       |             |        37.43         |     29.92     |  32.00  | 27.57  |   -    |
| wenge-research/yayi-7b-llama2  | Llama-2-7B |        -        |                |        38.56         |     31.52     |  30.99  | 25.95  |   -    |
| ziqingyang/chinese-llama-2-7b  | Llama-2-7B |        -        |                |        33.86         |     34.69     |  34.52  | 25.18  |  34.2  |
| TigerResearch/tigerbot-7b-base | Llama-2-7B |      0.3T       |             |        43.73         |     42.04     |  37.64  | 30.61  |   -    |
|  LinkSoul/Chinese-Llama-2-7b   | Llama-2-7B |        -        |                |        48.41         |     38.31     |  38.45  | 27.72  |   -    |
|       FlagAlpha/Atom-7B        | Llama-2-7B |      0.1T       |             |        49.96         |     41.10     |  39.83  | 33.00  |   -    |
| IDEA-CCNL/Ziya-LLaMA-13B-v1.1  | Llama-13B  |      0.11T      |            |        50.25         |     40.99     |  40.04  | 30.54  |   -    |
|  |  |  |  |  |  |  |  |  |
|    **Colossal-LLaMA-2-7b-base**    | Llama-2-7B |      **0.0085T**      |            |        53.06         |     49.89     |  51.48  | 58.82  |  50.2  |


156
157
158
### ColossalChat

<div align="center">
159
160
   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
161
162
163
   </a>
</div>

164
165
166
[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): 完整RLHF流程0门槛克隆 [ChatGPT](https://openai.com/blog/chatgpt/)
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
[[博客]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
167
168
169
170
171
172
173
174
[[在线样例]](https://www.youtube.com/watch?v=HcTiHzApHm0)
[[教程]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

<p id="ColossalChat-Speed" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>

- 最高可提升RLHF PPO阶段3训练速度10倍
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203

<p id="ColossalChat_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>

- 最高可提升单机训练速度7.73倍,单卡推理速度1.42倍

<p id="ColossalChat-1GPU" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
</p>

- 单卡模型容量最多提升10.3倍
- 最小demo训练流程最低仅需1.62GB显存 (任意消费级GPU)

<p id="ColossalChat-LoRA" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
</p>

- 提升单卡的微调模型容量3.7倍
- 同时保持高速运行

<p align="right">(<a href="#top">back to top</a>)</p>

### AIGC
加速AIGC(AI内容生成)模型,如[Stable Diffusion v1](https://github.com/CompVis/stable-diffusion)[Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)

<p id="diffusion_train" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20v2.png" width=800/>
</p>
204

205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
- [训练](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): 减少5.6倍显存消耗,硬件成本最高降低46倍(从A100到RTX3060)

<p id="diffusion_demo" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/DreamBooth.png" width=800/>
</p>

- [DreamBooth微调](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/dreambooth): 仅需3-5张目标主题图像个性化微调

<p id="inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Stable%20Diffusion%20Inference.jpg" width=800/>
</p>

- [推理](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion): GPU推理显存消耗降低2.5倍


<p align="right">(<a href="#top">返回顶端</a>)</p>

### 生物医药

加速 [AlphaFold](https://alphafold.ebi.ac.uk/) 蛋白质结构预测

<p id="FastFold" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/FastFold.jpg" width=800/>
</p>

- [FastFold](https://github.com/hpcaitech/FastFold): 加速AlphaFold训练与推理、数据前处理、推理序列长度超过10000残基

<p id="FastFold-Intel" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/data%20preprocessing%20with%20Intel.jpg" width=600/>
</p>

- [FastFold with Intel](https://github.com/hpcaitech/FastFold): 3倍推理加速和39%成本节省

<p id="xTrimoMultimer" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/xTrimoMultimer_Table.jpg" width=800/>
</p>

- [xTrimoMultimer](https://github.com/biomap-research/xTrimoMultimer): 11倍加速蛋白质单体与复合物结构预测

<p align="right">(<a href="#top">返回顶端</a>)</p>

## 并行训练样例展示
binmakeswell's avatar
binmakeswell committed
247
248
249
250
251
252
### LLaMA2
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/llama2_pretraining.png" width=600/>
</p>

- 700亿参数LLaMA2训练加速195%
253
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
binmakeswell's avatar
binmakeswell committed
254
255
256
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA1
257
258
259
260
261
262
263
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/LLaMA_pretraining.png" width=600/>
</p>

- 650亿参数大模型预训练加速38%
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[博客]](https://www.hpc-ai.tech/blog/large-model-pretraining)
binmakeswell's avatar
binmakeswell committed
264

binmakeswell's avatar
binmakeswell committed
265
266
267
268
269
270
271
272
273
### MoE
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/examples/images/MOE_training.png" width=800/>
</p>

- 专家并行再升级,开源MoE模型训练效率提升9倍
[[代码]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe)
[[博客]](https://www.hpc-ai.tech/blog/enhanced-moe-parallelism-open-source-moe-model-training-can-be-9-times-more-efficient)

274
### GPT-3
binmakeswell's avatar
binmakeswell committed
275
<p align="center">
Sze-qq's avatar
Sze-qq committed
276
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT3-v5.png" width=700/>
binmakeswell's avatar
binmakeswell committed
277
</p>
binmakeswell's avatar
binmakeswell committed
278

279
- 释放 50% GPU 资源占用, 或 10.7% 加速
280
281

### GPT-2
282
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2.png" width=800/>
283

binmakeswell's avatar
binmakeswell committed
284
- 降低11倍 GPU 显存占用,或超线性扩展(张量并行)
285

Sze-qq's avatar
Sze-qq committed
286
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/(updated)GPT-2.png" width=800>
287

binmakeswell's avatar
binmakeswell committed
288
- 用相同的硬件训练24倍大的模型
289
- 超3倍的吞吐量
binmakeswell's avatar
binmakeswell committed
290
291

### BERT
292
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BERT.png" width=800/>
binmakeswell's avatar
binmakeswell committed
293

294
- 2倍训练速度,或1.5倍序列长度
binmakeswell's avatar
binmakeswell committed
295

binmakeswell's avatar
binmakeswell committed
296
297
298
### PaLM
- [PaLM-colossalai](https://github.com/hpcaitech/PaLM-colossalai): 可扩展的谷歌 Pathways Language Model ([PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html)) 实现。

binmakeswell's avatar
binmakeswell committed
299
### OPT
300
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/OPT_update.png" width=800/>
binmakeswell's avatar
binmakeswell committed
301
302

- [Open Pretrained Transformer (OPT)](https://github.com/facebookresearch/metaseq), 由Meta发布的1750亿语言模型,由于完全公开了预训练参数权重,因此促进了下游任务和应用部署的发展。
303
- 加速45%,仅用几行代码以低成本微调OPT。[[样例]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/opt) [[在线推理]](https://colossalai.org/docs/advanced_tutorials/opt_service)
binmakeswell's avatar
binmakeswell committed
304

305
请访问我们的 [文档](https://www.colossalai.org/)[例程](https://github.com/hpcaitech/ColossalAI/tree/main/examples) 以了解详情。
binmakeswell's avatar
binmakeswell committed
306

307
308
309
310
311
312
### ViT
<p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/ViT.png" width="450" />
</p>

- 14倍批大小和5倍训练速度(张量并行=64)
313
314

### 推荐系统模型
315
- [Cached Embedding](https://github.com/hpcaitech/CachedEmbedding), 使用软件Cache实现Embeddings,用更少GPU显存训练更大的模型。
316
317


318
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
319

binmakeswell's avatar
binmakeswell committed
320
## 单GPU训练样例展示
binmakeswell's avatar
binmakeswell committed
321

322
323
324
325
### GPT-2
<p id="GPT-2-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-GPU1.png" width=450/>
</p>
binmakeswell's avatar
binmakeswell committed
326

binmakeswell's avatar
binmakeswell committed
327
- 用相同的硬件训练20倍大的模型
binmakeswell's avatar
binmakeswell committed
328

Jiarui Fang's avatar
Jiarui Fang committed
329
330
331
332
333
334
<p id="GPT-2-NVME" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/GPT2-NVME.png" width=800/>
</p>

- 用相同的硬件训练120倍大的模型 (RTX 3080)

335
336
337
338
### PaLM
<p id="PaLM-Single" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/PaLM-GPU1.png" width=450/>
</p>
binmakeswell's avatar
binmakeswell committed
339

binmakeswell's avatar
binmakeswell committed
340
341
- 用相同的硬件训练34倍大的模型

binmakeswell's avatar
binmakeswell committed
342
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
343
344


binmakeswell's avatar
binmakeswell committed
345
## 推理 (Energon-AI) 样例展示
binmakeswell's avatar
binmakeswell committed
346
347
348
349
350
351

<p id="GPT-3-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/inference_GPT-3.jpg" width=800/>
</p>

- [Energon-AI](https://github.com/hpcaitech/EnergonAI) :用相同的硬件推理加速50%
352

353
354
355
356
<p id="OPT-Serving" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20serving.png" width=600/>
</p>

357
- [OPT推理服务](https://colossalai.org/docs/advanced_tutorials/opt_service): 体验1750亿参数OPT在线推理服务
binmakeswell's avatar
binmakeswell committed
358

359
360
361
362
<p id="BLOOM-Inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/BLOOM%20Inference.PNG" width=800/>
</p>

binmakeswell's avatar
binmakeswell committed
363
- [BLOOM](https://github.com/hpcaitech/EnergonAI/tree/main/examples/bloom): 降低1760亿参数BLOOM模型部署推理成本超10倍
binmakeswell's avatar
binmakeswell committed
364

binmakeswell's avatar
binmakeswell committed
365
<p align="right">(<a href="#top">返回顶端</a>)</p>
366
367

## 安装
368
369
370

环境要求:

371
- PyTorch >= 1.11 并且 PyTorch <= 2.1
372
373
- Python >= 3.7
- CUDA >= 11.0
374
375
- [NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus) >= 7.0 (V100/RTX20 and higher)
- Linux OS
376

377
如果你遇到安装问题,可以向本项目 [反馈](https://github.com/hpcaitech/ColossalAI/issues/new/choose)
binmakeswell's avatar
binmakeswell committed
378

379

380
381
### 从PyPI安装

382
您可以用下面的命令直接从PyPI上下载并安装Colossal-AI。我们默认不会安装PyTorch扩展包。
383
384
385
386
387

```bash
pip install colossalai
```

388
389
**注:目前只支持Linux。**

390
391
392
393
394
395
396
397
398
399
400
401
402
403
但是,如果你想在安装时就直接构建PyTorch扩展,您可以设置环境变量`CUDA_EXT=1`.

```bash
CUDA_EXT=1 pip install colossalai
```

**否则,PyTorch扩展只会在你实际需要使用他们时在运行时里被构建。**

与此同时,我们也每周定时发布Nightly版本,这能让你提前体验到新的feature和bug fix。你可以通过以下命令安装Nightly版本。

```bash
pip install colossalai-nightly
```

404
### 从源码安装
405
406

> 此文档将与版本库的主分支保持一致。如果您遇到任何问题,欢迎给我们提 issue :)
binmakeswell's avatar
binmakeswell committed
407
408
409
410

```shell
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
411
412

# install dependency
binmakeswell's avatar
binmakeswell committed
413
414
pip install -r requirements/requirements.txt

415
# install colossalai
binmakeswell's avatar
binmakeswell committed
416
417
418
pip install .
```

419
我们默认在`pip install`时不安装PyTorch扩展,而是在运行时临时编译,如果你想要提前安装这些扩展的话(在使用融合优化器时会用到),可以使用一下命令。
binmakeswell's avatar
binmakeswell committed
420
421

```shell
422
CUDA_EXT=1 pip install .
binmakeswell's avatar
binmakeswell committed
423
424
```

425
426
<p align="right">(<a href="#top">返回顶端</a>)</p>

binmakeswell's avatar
binmakeswell committed
427
428
## 使用 Docker

429
430
431
432
433
434
### 从DockerHub获取镜像

您可以直接从我们的[DockerHub主页](https://hub.docker.com/r/hpcaitech/colossalai)获取最新的镜像,每一次发布我们都会自动上传最新的镜像。

### 本地构建镜像

binmakeswell's avatar
binmakeswell committed
435
运行以下命令从我们提供的 docker 文件中建立 docker 镜像。
binmakeswell's avatar
binmakeswell committed
436

437
438
439
> 在Dockerfile里编译Colossal-AI需要有GPU支持,您需要将Nvidia Docker Runtime设置为默认的Runtime。更多信息可以点击[这里](https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime)。
> 我们推荐从[项目主页](https://www.colossalai.org)直接下载Colossal-AI.

binmakeswell's avatar
binmakeswell committed
440
441
442
443
444
```bash
cd ColossalAI
docker build -t colossalai ./docker
```

binmakeswell's avatar
binmakeswell committed
445
运行以下命令从以交互式启动 docker 镜像.
binmakeswell's avatar
binmakeswell committed
446
447
448
449
450

```bash
docker run -ti --gpus all --rm --ipc=host colossalai bash
```

451
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
452
453
454
455

## 社区
欢迎通过[论坛](https://github.com/hpcaitech/ColossalAI/discussions),
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
binmakeswell's avatar
binmakeswell committed
456
[微信](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode")加入 Colossal-AI 社区,与我们分享你的建议和问题。
binmakeswell's avatar
binmakeswell committed
457
458


binmakeswell's avatar
binmakeswell committed
459
460
## 做出贡献

461
462
463
464
465
466
467
参考社区的成功案例,如 [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion) 等,
无论是个人开发者,还是算力、数据、模型等可能合作方,都欢迎参与参与共建 Colossal-AI 社区,拥抱大模型时代!

您可通过以下方式联系或参与:
1. [留下Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) 展现你的喜爱和支持,非常感谢!
2. 发布 [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), 或者在GitHub根据[贡献指南](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md) 提交一个 PR。
3. 发送你的正式合作提案到 contact@hpcaitech.com
binmakeswell's avatar
binmakeswell committed
468

binmakeswell's avatar
binmakeswell committed
469
470
真诚感谢所有贡献者!

471
472
473
<a href="https://github.com/hpcaitech/ColossalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=hpcaitech/ColossalAI"  width="800px"/>
</a>
binmakeswell's avatar
binmakeswell committed
474

475
<p align="right">(<a href="#top">返回顶端</a>)</p>
binmakeswell's avatar
binmakeswell committed
476
477


478
479
## CI/CD

480
我们使用[GitHub Actions](https://github.com/features/actions)来自动化大部分开发以及部署流程。如果想了解这些工作流是如何运行的,请查看这个[文档](https://github.com/hpcaitech/ColossalAI/blob/main/.github/workflows/README.md).
481
482


483
## 引用我们
binmakeswell's avatar
binmakeswell committed
484

485
486
487
488
Colossal-AI项目受一些相关的项目启发而成立,一些项目是我们的开发者的科研项目,另一些来自于其他组织的科研工作。我们希望. 我们希望在[参考文献列表](./REFERENCE.md)中列出这些令人称赞的项目,以向开源社区和研究项目致谢。

你可以通过以下格式引用这个项目。

binmakeswell's avatar
binmakeswell committed
489
490
491
492
493
494
495
496
```
@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}
```
497

498
Colossal-AI 已被[NeurIPS](https://nips.cc/), [SC](https://sc22.supercomputing.org/), [AAAI](https://aaai.org/Conferences/AAAI-23/),
499
[PPoPP](https://ppopp23.sigplan.org/), [CVPR](https://cvpr2023.thecvf.com/), [ISC](https://www.isc-hpc.com/), [NVIDIA GTC](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-S51482/) ,等顶级会议录取为官方教程。
500

binmakeswell's avatar
binmakeswell committed
501
<p align="right">(<a href="#top">返回顶端</a>)</p>