README.md 20.3 KB
Newer Older
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
1
<h1 align="center">
2
3
4
  <img width="auto" height="100px", src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/logo_coati.png"/>
  <br/>
  <span>ColossalChat</span>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
5
6
7
8
9
</h1>

## Table of Contents

- [Table of Contents](#table-of-contents)
10
- [What is ColossalChat and Coati ?](#what-is-colossalchat-and-coati-)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
11
12
13
14
15
16
- [Online demo](#online-demo)
- [Install](#install)
  - [Install the environment](#install-the-environment)
  - [Install the Transformers](#install-the-transformers)
- [How to use?](#how-to-use)
  - [Supervised datasets collection](#supervised-datasets-collection)
17
18
19
20
  - [RLHF Training Stage1 - Supervised instructs tuning](#RLHF-training-stage1---supervised-instructs-tuning)
  - [RLHF Training Stage2 - Training reward model](#RLHF-training-stage2---training-reward-model)
  - [RLHF Training Stage3 - Training model with reinforcement learning by human feedback](#RLHF-training-stage3---training-model-with-reinforcement-learning-by-human-feedback)
  - [Inference Quantization and Serving - After Training](#inference-quantization-and-serving---after-training)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
21
- [Coati7B examples](#coati7b-examples)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
22
23
  - [Generation](#generation)
  - [Open QA](#open-qa)
24
25
  - [Limitation for LLaMA-finetuned models](#limitation)
  - [Limitation of dataset](#limitation)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
26
- [FAQ](#faq)
27
28
  - [How to save/load checkpoint](#faq)
  - [How to train with limited resources](#faq)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
29
30
31
32
33
34
35
- [The Plan](#the-plan)
  - [Real-time progress](#real-time-progress)
- [Invitation to open-source contribution](#invitation-to-open-source-contribution)
- [Quick Preview](#quick-preview)
- [Authors](#authors)
- [Citations](#citations)
- [Licenses](#licenses)
36

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
37
---
38

39
## What is ColossalChat and Coati ?
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
40

41
[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) is the project to implement LLM with RLHF, powered by the [Colossal-AI](https://github.com/hpcaitech/ColossalAI) project.
42
43
44
45

Coati stands for `ColossalAI Talking Intelligence`. It is the name for the module implemented in this project and is also the name of the large language model developed by the ColossalChat project.

The Coati package provides a unified large language model framework that has implemented the following functions
46

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
47
48
- Supports comprehensive large-model training acceleration capabilities for ColossalAI, without requiring knowledge of complex distributed training algorithms
- Supervised datasets collection
49
- Supervised instructions fine-tuning
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
50
51
52
53
- Training reward model
- Reinforcement learning with human feedback
- Quantization inference
- Fast model deploying
54
- Perfectly integrated with the Hugging Face ecosystem, a high degree of model customization
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
55

56
57
58
59
<div align="center">
  <p align="center">
    <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png" width=700/>
  </p>
60

61
62
Image source: https://openai.com/blog/chatgpt

63
</div>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
64

65
**As Colossal-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.**
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
66

67
More details can be found in the latest news.
68
69
70

- [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
- [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)
71

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
72
## Online demo
73

74
75
76
77
78
<div align="center">
   <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
   </a>
</div>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
79

80
81
82
[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
83
84
85
86
87
88
[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)

<p id="ColossalChat-Speed" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
89

90
> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: `torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32`
91

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
92
93
94
95
## Install

### Install the environment

96
```bash
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
97
conda create -n coati
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
98
conda activate coati
binmakeswell's avatar
binmakeswell committed
99
100
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI/applications/Chat
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
101
102
103
104
105
pip install .
```

### Install the Transformers

106
```bash
107
pip install transformers==4.30.2
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
108
109
110
111
112
113
```

## How to use?

### Supervised datasets collection

114
115
We collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
[InstructionWild](https://github.com/XueFuzhao/InstructionWild) and in this [file](https://github.com/XueFuzhao/InstructionWild/blob/main/data/README.md).
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
116
117

Here is how we collected the data
118

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
119
<p align="center">
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
120
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
121
122
</p>

123
### RLHF Training Stage1 - Supervised instructs tuning
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
124

125
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
126

127
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
128
[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
129

130
131
132
133
134
135
136
137
138
139
140
141
142
143
**Note**: the supervised dataset follows the following format,

```json
[
    {
        "instruction": "Provide a list of the top 10 most popular mobile games in Asia",
        "input": "",
        "output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
        "id": 0
    },
    ...
]
```

144
### RLHF Training Stage2 - Training reward model
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
145
146
147

Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model

148
You can run the `examples/train_rm.sh` to start a reward model training.
149
[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
150

151
### RLHF Training Stage3 - Training model with reinforcement learning by human feedback
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
152
153
154
155

Stage3 uses reinforcement learning algorithm, which is the most complex part of the training process:

<p align="center">
BlueRum's avatar
BlueRum committed
156
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/stage-3.jpeg" width=800/>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
157
158
</p>

159
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
160
[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
161

162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
**Note**: the required datasets follow the following format,

- `pretrain dataset`

  ```json
  [
      {
          "instruction": "Provide a list of the top 10 most popular mobile games in Asia",
          "input": "",
          "output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
          "id": 0
      },
      ...
  ]
  ```

- `prompt dataset`

  ```json
  [
      {
          "instruction": "Edit this paragraph to make it more concise: \"Yesterday, I went to the store and bought some things. Then, I came home and put them away. After that, I went for a walk and met some friends.\"",
          "id": 0
      },
      {
          "instruction": "Write a descriptive paragraph about a memorable vacation you went on",
          "id": 1
      },
      ...
  ]
  ```

BlueRum's avatar
BlueRum committed
194
For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
195

196
### Inference Quantization and Serving - After Training
BlueRum's avatar
BlueRum committed
197

198
We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.
BlueRum's avatar
BlueRum committed
199

200
201
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference.

202
Online inference server scripts can help you deploy your own services.
BlueRum's avatar
BlueRum committed
203
204

For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
205
206
207

## Coati7B examples

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
208
209
210
211
### Generation

<details><summary><b>E-mail</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
212
![phd](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/Phd.png)
213

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
214
215
216
217
</details>

<details><summary><b>coding</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
218
![sort](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/quick_sort.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
219
220
221
222
223

</details>

<details><summary><b>regex</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
224
![regex](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/regex.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
225
226
227
228
229

</details>

<details><summary><b>Tex</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
230
![tex](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/tex.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
231
232
233
234
235

</details>

<details><summary><b>writing</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
236
![writing](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/writing.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
237
238
239
240
241

</details>

<details><summary><b>Table</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
242
![Table](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/table.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
243
244
245
246

</details>

### Open QA
247

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
248
249
<details><summary><b>Game</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
250
![Game](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/game.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
251
252
253
254
255

</details>

<details><summary><b>Travel</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
256
![Travel](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/travel.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
257
258
259
260
261

</details>

<details><summary><b>Physical</b></summary>

BlueRum's avatar
BlueRum committed
262
![Physical](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/physical.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
263
264
265
266
267

</details>

<details><summary><b>Chemical</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
268
![Chemical](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/chemical.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
269
270
271
272
273

</details>

<details><summary><b>Economy</b></summary>

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
274
![Economy](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/economy.png)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
275
276

</details>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
277

BlueRum's avatar
BlueRum committed
278
You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).
BlueRum's avatar
BlueRum committed
279

280
### Limitation
281

282
<details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
283
284
285
286
287
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list.
- Lack of Logics (reasoning and calculation)
- Tend to repeat the last sentence (fail to produce the end token).
- Poor multilingual results: LLaMA is mainly trained on English datasets (Generation performs better than QA).
288
</details>
BlueRum's avatar
BlueRum committed
289

290
<details><summary><b>Limitation of dataset</b></summary>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
291
292
293
294
295
296
- Lack of summarization ability: No such instructions in finetune datasets.
- Lack of multi-turn chat: No such instructions in finetune datasets
- Lack of self-recognition: No such instructions in finetune datasets
- Lack of Safety:
  - When the input contains fake facts, the model makes up false facts and explanations.
  - Cannot abide by OpenAI's policy: When generating prompts from OpenAI API, it always abides by its policy. So no violation case is in the datasets.
297
</details>
BlueRum's avatar
BlueRum committed
298

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
299
300
## FAQ

301
<details><summary><b>How to save/load checkpoint</b></summary>
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
302
303
304

We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.

305
```python
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
306
307
308
309
310
311
from coati.models.llama import LlamaLM
from coati.trainer import SFTTrainer

model = LlamaLM(pretrained=args.pretrain)
tokenizer = AutoTokenizer.from_pretrained(args.pretrain)

312
(model, optim) = strategy.prepare((model, optim))
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
313
trainer = SFTTrainer(model=model,
314
315
316
317
318
319
320
321
                     strategy=strategy,
                     optim=optim,
                     train_dataloader=train_dataloader,
                     eval_dataloader=eval_dataloader,
                     batch_size=args.batch_size,
                     max_epochs=args.max_epochs,
                     accumulation_steps=args.accumulation_steps
                     )
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
322
323

trainer.fit()
324
325
326
# this saves in pytorch format
strategy.save_model(model, args.save_path, only_rank0=True)

327
# this saves in HF format
328
strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=tokenizer)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
329
330
```

331
332
333
</details>

<details><summary><b>How to train with limited resources</b></summary>
334
335
336

Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.

337
If you only have a single 24G GPU, you can use the following script. `batch_size`, `lora_rank` and `grad_checkpoint` are the most important parameters to successfully train the model.
338
339
340

```bash
// [INFO]: MAX GPU MEMORY ALLOCATED:  19148.9345703125 MB
341
342
343
torchrun --standalone --nproc_per_node=1 train_sft.py \
    --pretrain "/path/to/LLaMa-7B/" \
    --model 'llama' \
344
    --strategy ddp \
345
346
347
    --save_path  /path/to/Coati-7B \
    --dataset /path/to/data.json \
    --batch_size 1 \
348
    --accumulation_steps 8 \
349
350
351
352
    --lr 2e-5 \
    --max_datasets_size 512 \
    --max_epochs 1 \
    --lora_rank 16 \
353
    --grad_checkpoint
354
355
356
```

`colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script.
357
358

```bash
359
360
361
362
363
364
365
torchrun --standalone --nproc_per_node=1 train_sft.py \
    --pretrain "/path/to/LLaMa-7B/" \
    --model 'llama' \
    --strategy colossalai_gemini \
    --save_path  /path/to/Coati-7B \
    --dataset /path/to/data.json \
    --batch_size 1 \
366
    --accumulation_steps 8 \
367
368
369
    --lr 2e-5 \
    --max_datasets_size 512 \
    --max_epochs 1 \
370
    --grad_checkpoint
371
```
372
373

If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
374
375

```bash
376
377
378
379
380
381
382
torchrun --standalone --nproc_per_node=4 train_sft.py \
    --pretrain "/path/to/LLaMa-7B/" \
    --model 'llama' \
    --strategy colossalai_zero2_cpu \
    --save_path  /path/to/Coati-7B \
    --dataset /path/to/data.json \
    --batch_size 1 \
383
    --accumulation_steps 8 \
384
385
386
    --lr 2e-5 \
    --max_datasets_size 512 \
    --max_epochs 1 \
387
    --grad_checkpoint
388
```
389

390
</details>
391

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
392
393
394
395
396
397
398
399
400
401
## The Plan

- [x] implement PPO fine-tuning
- [x] implement training reward model
- [x] support LoRA
- [x] support inference
- [x] support llama from [facebook](https://github.com/facebookresearch/llama)
- [x] implement PPO-ptx fine-tuning
- [ ] integrate with Ray
- [ ] support more RL paradigms, like Implicit Language Q-Learning (ILQL),
402
- [ ] support chain-of-thought by [langchain](https://github.com/hwchase17/langchain)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
403
404
405

### Real-time progress

406
You will find our progress in github [project broad](https://github.com/orgs/hpcaitech/projects/17/views/1).
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
407
408

## Invitation to open-source contribution
409

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
410
411
412
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!

You may contact us or participate in the following ways:
413

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
414
415
416
1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md).
3. Join the Colossal-AI community on
417
418
   [Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
   and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your ideas.
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
419
420
421
422
423
4. Send your official proposal to email contact@hpcaitech.com

Thanks so much to all of our amazing contributors!

## Quick Preview
424

425
426
427
428
429
430
431
432
<div align="center">
   <a href="https://chat.colossalai.org/">
   <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
   </a>
</div>

- An open-source low cost solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[demo]](https://chat.colossalai.org)

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
<p id="ChatGPT_scaling" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT%20scaling.png" width=800/>
</p>

- Up to 7.73 times faster for single server training and 1.42 times faster for single-GPU inference

<p id="ChatGPT-1GPU" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/ChatGPT-1GPU.jpg" width=450/>
</p>

- Up to 10.3x growth in model capacity on one GPU
- A mini demo training process requires only 1.62GB of GPU memory (any consumer-grade GPU)

<p id="inference" align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/LoRA%20data.jpg" width=600/>
</p>

- Increase the capacity of the fine-tuning model by up to 3.7 times on a single GPU
- Keep in a sufficiently high running speed

Yuanchen's avatar
Yuanchen committed
453
454
455
456
457
|  Model Pair   | Alpaca-7B ⚔ Coati-7B | Coati-7B ⚔ Alpaca-7B |
| :-----------: | :------------------: | :------------------: |
| Better Cases  |     38 ⚔ **41**      |     **45** ⚔ 33      |
|   Win Rate    |    48% ⚔ **52%**     |    **58%** ⚔ 42%     |
| Average Score |   7.06 ⚔ **7.13**    |   **7.31** ⚔ 6.82    |
458

Yuanchen's avatar
Yuanchen committed
459
460
- Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner.

Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
461
462
## Authors

463
Coati is developed by ColossalAI Team:
464

465
466
467
468
469
- [Fazzie](https://fazzie-key.cool/about/index.html)
- [FrankLeeeee](https://github.com/FrankLeeeee)
- [BlueRum](https://github.com/ht-zhou)
- [ver217](https://github.com/ver217)
- [ofey404](https://github.com/ofey404)
470
- [Wenhao Chen](https://github.com/CWHer)
471
472

The Phd student from [(HPC-AI) Lab](https://ai.comp.nus.edu.sg/) also contributed a lot to this project.
473

474
475
- [Zangwei Zheng](https://github.com/zhengzangw)
- [Xue Fuzhao](https://github.com/XueFuzhao)
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509

## Citations

```bibtex
@article{Hu2021LoRALA,
    title   = {LoRA: Low-Rank Adaptation of Large Language Models},
    author  = {Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Weizhu Chen},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2106.09685}
}

@article{ouyang2022training,
  title={Training language models to follow instructions with human feedback},
  author={Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others},
  journal={arXiv preprint arXiv:2203.02155},
  year={2022}
}

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
510
511
512
513
514
515
516
517
518

@misc{instructionwild,
  author = {Fuzhao Xue and Zangwei Zheng and Yang You },
  title = {Instruction in the Wild: A User-based Instruction Dataset},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/XueFuzhao/InstructionWild}},
}
Fazzie-Maqianli's avatar
Fazzie-Maqianli committed
519
520
521
522
523
```

## Licenses

Coati is licensed under the [Apache 2.0 License](LICENSE).