"git@developer.sourcefind.cn:OpenDAS/openfold.git" did not exist on "07421c47a3861a33e7b204d9ab05bba8c81b6f35"
Unverified Commit 6d41c3f2 authored by Wenhao Chen's avatar Wenhao Chen Committed by GitHub
Browse files

[doc] update Coati README (#4405)

* style: apply formatter

* fix: add outdated warnings

* docs: add dataset format and polish

* docs: polish README

* fix: fix json format

* fix: fix typos

* revert: revert 7b example
parent d86ddd9b
...@@ -4,7 +4,6 @@ ...@@ -4,7 +4,6 @@
<span>ColossalChat</span> <span>ColossalChat</span>
</h1> </h1>
## Table of Contents ## Table of Contents
- [Table of Contents](#table-of-contents) - [Table of Contents](#table-of-contents)
...@@ -34,7 +33,9 @@ ...@@ -34,7 +33,9 @@
- [Authors](#authors) - [Authors](#authors)
- [Citations](#citations) - [Citations](#citations)
- [Licenses](#licenses) - [Licenses](#licenses)
--- ---
## What is ColossalChat and Coati ? ## What is ColossalChat and Coati ?
[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) is the project to implement LLM with RLHF, powered by the [Colossal-AI](https://github.com/hpcaitech/ColossalAI) project. [ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) is the project to implement LLM with RLHF, powered by the [Colossal-AI](https://github.com/hpcaitech/ColossalAI) project.
...@@ -42,6 +43,7 @@ ...@@ -42,6 +43,7 @@
Coati stands for `ColossalAI Talking Intelligence`. It is the name for the module implemented in this project and is also the name of the large language model developed by the ColossalChat project. Coati stands for `ColossalAI Talking Intelligence`. It is the name for the module implemented in this project and is also the name of the large language model developed by the ColossalChat project.
The Coati package provides a unified large language model framework that has implemented the following functions The Coati package provides a unified large language model framework that has implemented the following functions
- Supports comprehensive large-model training acceleration capabilities for ColossalAI, without requiring knowledge of complex distributed training algorithms - Supports comprehensive large-model training acceleration capabilities for ColossalAI, without requiring knowledge of complex distributed training algorithms
- Supervised datasets collection - Supervised datasets collection
- Supervised instructions fine-tuning - Supervised instructions fine-tuning
...@@ -56,17 +58,19 @@ The Coati package provides a unified large language model framework that has imp ...@@ -56,17 +58,19 @@ The Coati package provides a unified large language model framework that has imp
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png" width=700/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png" width=700/>
</p> </p>
Image source: https://openai.com/blog/chatgpt Image source: https://openai.com/blog/chatgpt
</div> </div>
**As Colossal-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.** **As Colossal-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.**
More details can be found in the latest news. More details can be found in the latest news.
* [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt) - [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
- [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)
## Online demo ## Online demo
<div align="center"> <div align="center">
<a href="https://www.youtube.com/watch?v=HcTiHzApHm0"> <a href="https://www.youtube.com/watch?v=HcTiHzApHm0">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" /> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20YouTube.png" width="700" />
...@@ -83,13 +87,13 @@ More details can be found in the latest news. ...@@ -83,13 +87,13 @@ More details can be found in the latest news.
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ColossalChat%20Speed.jpg" width=450/>
</p> </p>
> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32 > DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: `torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --num_collect_steps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32`
## Install ## Install
### Install the environment ### Install the environment
```shell ```bash
conda create -n coati conda create -n coati
conda activate coati conda activate coati
git clone https://github.com/hpcaitech/ColossalAI.git git clone https://github.com/hpcaitech/ColossalAI.git
...@@ -99,7 +103,7 @@ pip install . ...@@ -99,7 +103,7 @@ pip install .
### Install the Transformers ### Install the Transformers
```shell ```bash
pip install transformers==4.30.2 pip install transformers==4.30.2
``` ```
...@@ -107,10 +111,11 @@ pip install transformers==4.30.2 ...@@ -107,10 +111,11 @@ pip install transformers==4.30.2
### Supervised datasets collection ### Supervised datasets collection
we collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo We collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
[InstructionWild](https://github.com/XueFuzhao/InstructionWild) [InstructionWild](https://github.com/XueFuzhao/InstructionWild) and in this [file](https://github.com/XueFuzhao/InstructionWild/blob/main/data/README.md).
Here is how we collected the data Here is how we collected the data
<p align="center"> <p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p> </p>
...@@ -122,6 +127,20 @@ Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned ea ...@@ -122,6 +127,20 @@ Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned ea
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning. You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg) [[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
**Note**: the supervised dataset follows the following format,
```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```
### RLHF Training Stage2 - Training reward model ### RLHF Training Stage2 - Training reward model
Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model
...@@ -140,13 +159,46 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of ...@@ -140,13 +159,46 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
You can run the `examples/train_prompts.sh` to start training PPO with human feedback. You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g) [[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
**Note**: the required datasets follow the following format,
- `pretrain dataset`
```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```
- `prompt dataset`
```json
[
{
"instruction": "Edit this paragraph to make it more concise: \"Yesterday, I went to the store and bought some things. Then, I came home and put them away. After that, I went for a walk and met some friends.\"",
"id": 0
},
{
"instruction": "Write a descriptive paragraph about a memorable vacation you went on",
"id": 1
},
...
]
```
For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples). For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).
### Inference Quantization and Serving - After Training ### Inference Quantization and Serving - After Training
We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models. We provide an online inference server and a benchmark. We aim to run inference on single GPU, so quantization is essential when using large models.
We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference. You can We support 8-bit quantization (RTN), 4-bit quantization (GPTQ), and FP16 inference.
Online inference server scripts can help you deploy your own services. Online inference server scripts can help you deploy your own services.
For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference). For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).
...@@ -158,6 +210,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre ...@@ -158,6 +210,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
<details><summary><b>E-mail</b></summary> <details><summary><b>E-mail</b></summary>
![phd](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/Phd.png) ![phd](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/Phd.png)
</details> </details>
<details><summary><b>coding</b></summary> <details><summary><b>coding</b></summary>
...@@ -191,6 +244,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre ...@@ -191,6 +244,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
</details> </details>
### Open QA ### Open QA
<details><summary><b>Game</b></summary> <details><summary><b>Game</b></summary>
![Game](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/game.png) ![Game](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/game.png)
...@@ -224,6 +278,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre ...@@ -224,6 +278,7 @@ For more details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tre
You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md). You can find more examples in this [repo](https://github.com/XueFuzhao/InstructionWild/blob/main/comparison.md).
### Limitation ### Limitation
<details><summary><b>Limitation for LLaMA-finetuned models</b></summary> <details><summary><b>Limitation for LLaMA-finetuned models</b></summary>
- Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage. - Both Alpaca and ColossalChat are based on LLaMA. It is hard to compensate for the missing knowledge in the pre-training stage.
- Lack of counting ability: Cannot count the number of items in a list. - Lack of counting ability: Cannot count the number of items in a list.
...@@ -247,7 +302,7 @@ You can find more examples in this [repo](https://github.com/XueFuzhao/Instructi ...@@ -247,7 +302,7 @@ You can find more examples in this [repo](https://github.com/XueFuzhao/Instructi
We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format. We have integrated the Transformers save and load pipeline, allowing users to freely call Hugging Face's language models and save them in the HF format.
``` ```python
from coati.models.llama import LlamaLM from coati.models.llama import LlamaLM
from coati.trainer import SFTTrainer from coati.trainer import SFTTrainer
...@@ -256,20 +311,20 @@ tokenizer = AutoTokenizer.from_pretrained(args.pretrain) ...@@ -256,20 +311,20 @@ tokenizer = AutoTokenizer.from_pretrained(args.pretrain)
(model, optim) = strategy.prepare((model, optim)) (model, optim) = strategy.prepare((model, optim))
trainer = SFTTrainer(model=model, trainer = SFTTrainer(model=model,
strategy=strategy, strategy=strategy,
optim=optim, optim=optim,
train_dataloader=train_dataloader, train_dataloader=train_dataloader,
eval_dataloader=eval_dataloader, eval_dataloader=eval_dataloader,
batch_size=args.batch_size, batch_size=args.batch_size,
max_epochs=args.max_epochs, max_epochs=args.max_epochs,
accumulation_steps = args.accumulation_steps accumulation_steps=args.accumulation_steps
) )
trainer.fit() trainer.fit()
# this saves in pytorch format # this saves in pytorch format
strategy.save_model(model, args.save_path, only_rank0=True) strategy.save_model(model, args.save_path, only_rank0=True)
# this saves in HF format. ColossalAI strategy with stage-3 doesn't support this method # this saves in HF format
strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=tokenizer) strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=tokenizer)
``` ```
...@@ -280,12 +335,13 @@ strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=token ...@@ -280,12 +335,13 @@ strategy.save_pretrained(model, args.save_path, only_rank0=True, tokenizer=token
Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs. Here are some examples that can allow you to train a 7B model on a single or multiple consumer-grade GPUs.
If you only have a single 24G GPU, you can use the following script. `batch_size`, `lora_rank` and `grad_checkpoint` are the most important parameters to successfully train the model. If you only have a single 24G GPU, you can use the following script. `batch_size`, `lora_rank` and `grad_checkpoint` are the most important parameters to successfully train the model.
```
```bash
// [INFO]: MAX GPU MEMORY ALLOCATED: 19148.9345703125 MB
torchrun --standalone --nproc_per_node=1 train_sft.py \ torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
--strategy ddp \ --strategy ddp \
--log_interval 10 \
--save_path /path/to/Coati-7B \ --save_path /path/to/Coati-7B \
--dataset /path/to/data.json \ --dataset /path/to/data.json \
--batch_size 1 \ --batch_size 1 \
...@@ -298,12 +354,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \ ...@@ -298,12 +354,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
``` ```
`colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script. `colossalai_gemini` strategy can enable a single 24G GPU to train the whole model without using LoRA if you have sufficient CPU memory. You can use the following script.
```
```bash
torchrun --standalone --nproc_per_node=1 train_sft.py \ torchrun --standalone --nproc_per_node=1 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
--strategy colossalai_gemini \ --strategy colossalai_gemini \
--log_interval 10 \
--save_path /path/to/Coati-7B \ --save_path /path/to/Coati-7B \
--dataset /path/to/data.json \ --dataset /path/to/data.json \
--batch_size 1 \ --batch_size 1 \
...@@ -315,12 +371,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \ ...@@ -315,12 +371,12 @@ torchrun --standalone --nproc_per_node=1 train_sft.py \
``` ```
If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows. If you have 4x32 GB GPUs, you can even train the whole 7B model using our `colossalai_zero2_cpu` strategy! The script is given as follows.
```
```bash
torchrun --standalone --nproc_per_node=4 train_sft.py \ torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
--strategy colossalai_zero2_cpu \ --strategy colossalai_zero2_cpu \
--log_interval 10 \
--save_path /path/to/Coati-7B \ --save_path /path/to/Coati-7B \
--dataset /path/to/data.json \ --dataset /path/to/data.json \
--batch_size 1 \ --batch_size 1 \
...@@ -330,8 +386,8 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \ ...@@ -330,8 +386,8 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
--max_epochs 1 \ --max_epochs 1 \
--grad_checkpoint --grad_checkpoint
``` ```
</details>
</details>
## The Plan ## The Plan
...@@ -346,24 +402,26 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \ ...@@ -346,24 +402,26 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
- [ ] support chain-of-thought by [langchain](https://github.com/hwchase17/langchain) - [ ] support chain-of-thought by [langchain](https://github.com/hwchase17/langchain)
### Real-time progress ### Real-time progress
You will find our progress in github project broad
[Coati](https://github.com/orgs/hpcaitech/projects/17/views/1) You will find our progress in github [project broad](https://github.com/orgs/hpcaitech/projects/17/views/1).
## Invitation to open-source contribution ## Invitation to open-source contribution
Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models from the starting point of replicating ChatGPT! Referring to the successful attempts of [BLOOM](https://bigscience.huggingface.co/) and [Stable Diffusion](https://en.wikipedia.org/wiki/Stable_Diffusion), any and all developers and partners with computing powers, datasets, models are welcome to join and build the Colossal-AI community, making efforts towards the era of big AI models from the starting point of replicating ChatGPT!
You may contact us or participate in the following ways: You may contact us or participate in the following ways:
1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks! 1. [Leaving a Star ⭐](https://github.com/hpcaitech/ColossalAI/stargazers) to show your like and support. Thanks!
2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md). 2. Posting an [issue](https://github.com/hpcaitech/ColossalAI/issues/new/choose), or submitting a PR on GitHub follow the guideline in [Contributing](https://github.com/hpcaitech/ColossalAI/blob/main/CONTRIBUTING.md).
3. Join the Colossal-AI community on 3. Join the Colossal-AI community on
[Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w), [Slack](https://join.slack.com/t/colossalaiworkspace/shared_invite/zt-z7b26eeb-CBp7jouvu~r0~lcFzX832w),
and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your ideas. and [WeChat(微信)](https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/WeChat.png "qrcode") to share your ideas.
4. Send your official proposal to email contact@hpcaitech.com 4. Send your official proposal to email contact@hpcaitech.com
Thanks so much to all of our amazing contributors! Thanks so much to all of our amazing contributors!
## Quick Preview ## Quick Preview
<div align="center"> <div align="center">
<a href="https://chat.colossalai.org/"> <a href="https://chat.colossalai.org/">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" /> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/Chat-demo.png" width="700" />
...@@ -397,18 +455,22 @@ Thanks so much to all of our amazing contributors! ...@@ -397,18 +455,22 @@ Thanks so much to all of our amazing contributors!
| Better Cases | 38 ⚔ **41** | **45** ⚔ 33 | | Better Cases | 38 ⚔ **41** | **45** ⚔ 33 |
| Win Rate | 48% ⚔ **52%** | **58%** ⚔ 42% | | Win Rate | 48% ⚔ **52%** | **58%** ⚔ 42% |
| Average Score | 7.06 ⚔ **7.13** | **7.31** ⚔ 6.82 | | Average Score | 7.06 ⚔ **7.13** | **7.31** ⚔ 6.82 |
- Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner. - Our Coati-7B model performs better than Alpaca-7B when using GPT-4 to evaluate model performance. The Coati-7B model we evaluate is an old version we trained a few weeks ago and the new version is around the corner.
## Authors ## Authors
Coati is developed by ColossalAI Team: Coati is developed by ColossalAI Team:
- [Fazzie](https://fazzie-key.cool/about/index.html) - [Fazzie](https://fazzie-key.cool/about/index.html)
- [FrankLeeeee](https://github.com/FrankLeeeee) - [FrankLeeeee](https://github.com/FrankLeeeee)
- [BlueRum](https://github.com/ht-zhou) - [BlueRum](https://github.com/ht-zhou)
- [ver217](https://github.com/ver217) - [ver217](https://github.com/ver217)
- [ofey404](https://github.com/ofey404) - [ofey404](https://github.com/ofey404)
- [Wenhao Chen](https://github.com/CWHer)
The Phd student from [(HPC-AI) Lab](https://ai.comp.nus.edu.sg/) also contributed a lot to this project. The Phd student from [(HPC-AI) Lab](https://ai.comp.nus.edu.sg/) also contributed a lot to this project.
- [Zangwei Zheng](https://github.com/zhengzangw) - [Zangwei Zheng](https://github.com/zhengzangw)
- [Xue Fuzhao](https://github.com/XueFuzhao) - [Xue Fuzhao](https://github.com/XueFuzhao)
......
...@@ -27,9 +27,12 @@ We also provide various training strategies: ...@@ -27,9 +27,12 @@ We also provide various training strategies:
We only support `torchrun` to launch now. E.g. We only support `torchrun` to launch now. E.g.
```shell ```bash
# run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size # run OPT-125M with no lora (lora_rank=0) on single-node single-GPU with min batch size
torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py --model 125m --critic_model 125m --strategy ddp --experience_batch_size 1 --train_batch_size 1 --lora_rank 0 torchrun --standalone --nproc_per_node 1 benchmark_opt_lora_dummy.py \
--model 125m --critic_model 125m --strategy ddp \
--experience_batch_size 1 --train_batch_size 1 --lora_rank 0
# run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU # run Actor (OPT-1.3B) and Critic (OPT-350M) with lora_rank=4 on single-node 4-GPU
torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py --model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4 torchrun --standalone --nproc_per_node 4 benchmark_opt_lora_dummy.py \
--model 1.3b --critic_model 350m --strategy colossalai_zero2 --lora_rank 4
``` ```
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**
# Distributed PPO Training on Stage 3 # Distributed PPO Training on Stage 3
## Detach Experience Makers and Trainers ## Detach Experience Makers and Trainers
...@@ -26,124 +28,137 @@ See examples at `ColossalAI/application/Chat/examples/ray` ...@@ -26,124 +28,137 @@ See examples at `ColossalAI/application/Chat/examples/ray`
- define makers' environment variables : - define makers' environment variables :
```python ```python
env_info_makers = [{ env_info_makers = [{
'local_rank': '0', 'local_rank': '0',
'rank': str(rank), 'rank': str(rank),
'world_size': str(num_makers), 'world_size': str(num_makers),
'master_port': maker_port, 'master_port': maker_port,
'master_addr': master_addr 'master_addr': master_addr
} for rank in range(num_makers)] } for rank in range(num_makers)]
```
```
- define maker models : - define maker models :
```python
def model_fn(): ```python
actor = get_actor_from_args(...) def model_fn():
critic = get_critic_from_args(...) actor = get_actor_from_args(...)
reward_model = get_reward_model_from_args(...) critic = get_critic_from_args(...)
initial_model = get_actor_from_args(...) reward_model = get_reward_model_from_args(...)
return actor, critic, reward_model, initial_model initial_model = get_actor_from_args(...)
return actor, critic, reward_model, initial_model
```
```
- set experience_holder_refs : - set experience_holder_refs :
```python ```python
experience_holder_refs = [ experience_holder_refs = [
ExperienceMakerHolder.options( ExperienceMakerHolder.options(
name=f"maker_{i}", name=f"maker_{i}",
num_gpus=1, num_gpus=1,
max_concurrency=2 max_concurrency=2
).remote( ).remote(
detached_trainer_name_list=[f"trainer_{x}" for x in target_trainers(...)], detached_trainer_name_list=[f"trainer_{x}" for x in target_trainers(...)],
model_fn=model_fn, model_fn=model_fn,
...) ...)
for i, env_info_maker in enumerate(env_info_makers) for i, env_info_maker in enumerate(env_info_makers)
] ]
``` ```
The names in the `detached_trainer_name_list` refer to the target trainers that the maker should send experience to.
We set a trainer's name the same as a maker, by `.options(name="str")`. See below. The names in the `detached_trainer_name_list` refer to the target trainers that the maker should send experience to.
We set a trainer's name the same as a maker, by `.options(name="str")`. See below.
### Setup Trainers ### Setup Trainers
- define trainers' environment variables : - define trainers' environment variables :
```python ```python
env_info_trainers = [{ env_info_trainers = [{
'local_rank': '0', 'local_rank': '0',
'rank': str(rank), 'rank': str(rank),
'world_size': str(num_trainers), 'world_size': str(num_trainers),
'master_port': trainer_port, 'master_port': trainer_port,
'master_addr': master_addr 'master_addr': master_addr
} for rank in range(num_trainers)] } for rank in range(num_trainers)]
``` ```
- define trainer models : - define trainer models :
```python ```python
def trainer_model_fn(): def trainer_model_fn():
actor = get_actor_from_args(...) actor = get_actor_from_args(...)
critic = get_critic_from_args(...) critic = get_critic_from_args(...)
return actor, critic return actor, critic
``` ```
- set trainer_refs : - set trainer_refs :
```python ```python
trainer_refs = [ trainer_refs = [
DetachedPPOTrainer.options( DetachedPPOTrainer.options(
name=f"trainer{i}", name=f"trainer{i}",
num_gpus=1, num_gpus=1,
max_concurrency=2 max_concurrency=2
).remote( ).remote(
experience_maker_holder_name_list=[f"maker{x}" for x in target_makers(...)], experience_maker_holder_name_list=[f"maker{x}" for x in target_makers(...)],
model_fn = trainer_model_fn(), model_fn = trainer_model_fn(),
...) ...)
for i, env_info_trainer in enumerate(env_info_trainers) for i, env_info_trainer in enumerate(env_info_trainers)
] ]
``` ```
The names in `experience_maker_holder_name_list` refer to the target makers that the trainer should send updated models to. The names in `experience_maker_holder_name_list` refer to the target makers that the trainer should send updated models to.
By setting `detached_trainer_name_list` and `experience_maker_holder_name_list`, we can customize the transmission graph. By setting `detached_trainer_name_list` and `experience_maker_holder_name_list`, we can customize the transmission graph.
### Launch Jobs ### Launch Jobs
- define data_loader : - define data_loader :
```python
def data_loader_fn():
return = torch.utils.data.DataLoader(dataset=dataset)
``` ```python
def data_loader_fn():
return = torch.utils.data.DataLoader(dataset=dataset)
```
- launch makers : - launch makers :
```python
wait_tasks = []
for experience_holder_ref in experience_holder_refs:
wait_tasks.append(
experience_holder_ref.workingloop.remote(data_loader_fn(),
num_steps=experience_steps))
``` ```python
wait_tasks = []
for experience_holder_ref in experience_holder_refs:
wait_tasks.append(
experience_holder_ref.workingloop.remote(data_loader_fn(),
num_steps=experience_steps))
```
- launch trainers : - launch trainers :
```python
for trainer_ref in trainer_refs: ```python
wait_tasks.append(trainer_ref.fit.remote(total_steps, update_steps, train_epochs)) for trainer_ref in trainer_refs:
``` wait_tasks.append(trainer_ref.fit.remote(total_steps, update_steps, train_epochs))
```
- wait for done : - wait for done :
```python ```python
ray.get(wait_tasks) ray.get(wait_tasks)
``` ```
## Flexible Structure ## Flexible Structure
We can deploy different strategies to makers and trainers. Here are some notions. We can deploy different strategies to makers and trainers. Here are some notions.
### 2 Makers 1 Trainer ### 2 Makers 1 Trainer
<p align="center"> <p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m1t.png?raw=true" width=600/> <img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m1t.png?raw=true" width=600/>
</p> </p>
### 2 Makers 2 Trainer ### 2 Makers 2 Trainer
<p align="center"> <p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t.png?raw=true" width=600/> <img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t.png?raw=true" width=600/>
</p> </p>
### Maker Inference Quantization ### Maker Inference Quantization
<p align="center"> <p align="center">
<img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t_quantize.png?raw=true" width=600/> <img src="https://github.com/hpcaitech/public_assets/blob/main/applications/chat/2m2t_quantize.png?raw=true" width=600/>
</p> </p>
......
This diff is collapsed.
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
- [Arg List](#arg-list-2) - [Arg List](#arg-list-2)
- [Inference example - After Stage3](#inference-example---after-stage3) - [Inference example - After Stage3](#inference-example---after-stage3)
- [Attention](#attention) - [Attention](#attention)
- [data](#data) - [data](#data)
- [Support Model](#support-model) - [Support Model](#support-model)
- [GPT](#gpt) - [GPT](#gpt)
- [BLOOM](#bloom) - [BLOOM](#bloom)
...@@ -28,8 +28,8 @@ ...@@ -28,8 +28,8 @@
- [Reward model](#reward-model) - [Reward model](#reward-model)
- [Critic model](#critic-model) - [Critic model](#critic-model)
--- ---
## Install requirements ## Install requirements
```shell ```shell
...@@ -38,10 +38,11 @@ pip install -r requirements.txt ...@@ -38,10 +38,11 @@ pip install -r requirements.txt
## Supervised datasets collection ## Supervised datasets collection
We collected 104K bilingual dataset of Chinese and English, and you can find the datasets in this repo We collected 104K bilingual datasets of Chinese and English, and you can find the datasets in this repo
[InstructionWild](https://github.com/XueFuzhao/InstructionWild). [InstructionWild](https://github.com/XueFuzhao/InstructionWild) and in this [file](https://github.com/XueFuzhao/InstructionWild/blob/main/data/README.md).
Here is how we collected the data
The following pic shows how we collected the data.
<p align="center"> <p align="center">
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/> <img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/data-collect.png" width=500/>
</p> </p>
...@@ -52,38 +53,40 @@ In order to further improve the model's ability to handle multi-turn conversatio ...@@ -52,38 +53,40 @@ In order to further improve the model's ability to handle multi-turn conversatio
A sample of conversation dataset should have the following fields: A sample of conversation dataset should have the following fields:
* `type` (str, optional): The type of the data sample. - `type` (str, optional): The type of the data sample.
* `language` (str, optional): The language of the data sample. - `language` (str, optional): The language of the data sample.
* `dataset` (str, optional): The dataset the data sample originates from. - `dataset` (str, optional): The dataset the data sample originates from.
* `conversations` (str, compulsory): Conversation content of the data sample. - `conversations` (str, compulsory): Conversation content of the data sample.
* `id` (int, optional): The ID of the data sample. - `id` (int, optional): The ID of the data sample.
A simple example: A simple example:
```json ```json
{ {
"type": "instruction", "type": "instruction",
"language": "English", "language": "English",
"dataset": "Alpaca", "dataset": "Alpaca",
"conversations": [ "conversations": [
{ {
"from": "human", "from": "human",
"value": "Give three tips for staying healthy." "value": "Give three tips for staying healthy."
}, },
{ {
"from": "gpt", "from": "gpt",
"value": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule." "value": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
} }
], ],
"id": 1 "id": 1
} }
``` ```
> **NOTE:** Only key `conversations` is compulsary for training and other keys serve as metadata. The length of `conversations` varies. > **NOTE:** Only key `conversations` is compulsary for training and other keys serve as metadata. The length of `conversations` varies.
You can run the `examples/generate_conversation_dataset.py` to generate a conversation dataset supported by ColossalChat. You can run the `examples/generate_conversation_dataset.py` to generate a conversation dataset supported by ColossalChat.
You can use the following cmd to generate conversation dataset. You can use the following cmd to generate conversation dataset.
```
```bash
python generate_conversation_dataset.py \ python generate_conversation_dataset.py \
--dataset "All" --dataset "All"
--save_path "/path/to/dataset" --save_path "/path/to/dataset"
...@@ -97,12 +100,12 @@ Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned ea ...@@ -97,12 +100,12 @@ Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned ea
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning. You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
You can also use the following cmd to start a supervised instructs fine-tuning with your own settings. You can also use the following cmd to start a supervised instructs fine-tuning with your own settings.
```
```bash
torchrun --standalone --nproc_per_node=4 train_sft.py \ torchrun --standalone --nproc_per_node=4 train_sft.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
--strategy colossalai_zero2 \ --strategy colossalai_zero2 \
--log_interval 10 \
--save_path /path/to/Coati-7B \ --save_path /path/to/Coati-7B \
--dataset /path/to/data.json \ --dataset /path/to/data.json \
--batch_size 4 \ --batch_size 4 \
...@@ -112,18 +115,33 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \ ...@@ -112,18 +115,33 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
--max_epochs 1 \ --max_epochs 1 \
--grad_checkpoint --grad_checkpoint
``` ```
**Note**: the supervised dataset follows the following format,
```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```
### Arg List ### Arg List
- --strategy: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --model: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom' - `--strategy`: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --pretrain: pretrain model, type=str, default=None - `--model`: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
- --max_datasets_size: the max size of dataset, type=int, default=None - `--pretrain`: pretrain model, type=str, default=None
- --save_path: path to save the model, type=str, default='output' - `--max_datasets_size`: the max size of dataset, type=int, default=None
- --need_optim_ckpt: whether to save optim ckpt, type=bool, default=False - `--save_path`: path to save the model, type=str, default='output'
- --max_epochs: max epochs for training, type=int, default=3 - `--need_optim_ckpt`: whether to save optim ckpt, type=bool, default=False
- --batch_size: batch size while training, type=int, default=4 - `--max_epochs`: max epochs for training, type=int, default=3
- --lora_rank: low-rank adaptation matrices rank, type=int, default=0 - `--batch_size`: batch size while training, type=int, default=4
- --log_interval: how many steps to log, type=int, default=100 - `--lora_rank`: low-rank adaptation matrices rank, type=int, default=0
- --grad_checkpoint: enable gradient checkpointing, type=bool, default=False - `--grad_checkpoint`: enable gradient checkpointing, type=bool, default=False
## Stage2 - Training reward model ## Stage2 - Training reward model
...@@ -133,7 +151,8 @@ We train a reward model in stage 2, which obtains corresponding scores by manual ...@@ -133,7 +151,8 @@ We train a reward model in stage 2, which obtains corresponding scores by manual
You can run the `examples/train_rm.sh` to start a reward model training. You can run the `examples/train_rm.sh` to start a reward model training.
You can also use the following cmd to start training a reward model. You can also use the following cmd to start training a reward model.
```
```bash
torchrun --standalone --nproc_per_node=4 train_reward_model.py \ torchrun --standalone --nproc_per_node=4 train_reward_model.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
...@@ -141,16 +160,19 @@ torchrun --standalone --nproc_per_node=4 train_reward_model.py \ ...@@ -141,16 +160,19 @@ torchrun --standalone --nproc_per_node=4 train_reward_model.py \
--loss_fn 'log_exp'\ --loss_fn 'log_exp'\
--save_path 'rmstatic.pt' \ --save_path 'rmstatic.pt' \
``` ```
### Features and tricks in RM training ### Features and tricks in RM training
- We support [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)and[rm-static](https://huggingface.co/datasets/Dahoas/rm-static) datasets. - We support [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)and[rm-static](https://huggingface.co/datasets/Dahoas/rm-static) datasets.
- We support 2 kinds of loss_function named 'log_sig'(used by OpenAI) and 'log_exp'(used by Anthropic). - We support 2 kinds of loss function named `log_sig`(used by OpenAI) and `log_exp`(used by Anthropic).
- We change the loss to valid_acc and pair_dist to monitor progress during training. - We change the loss to `valid_acc` and `pair_dist` to monitor progress during training.
- We add special token to the end of the sequence to get better result. - We add special token to the end of the sequence to get better result.
- We use cosine-reducing lr-scheduler for RM training. - We use cosine-reducing lr-scheduler for RM training.
- We set value_head as 1 liner layer and initialize the weight of value_head using N(0,1/(d_model + 1)) distribution. - We set value_head as 1 liner layer and initialize the weight of value_head using N(0,1/(d_model + 1)) distribution.
- We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2204.05862). - We train a Bloom-560m reward model for 1 epoch and find the test acc of the model achieve the performance mentions in [Anthropics paper](https://arxiv.org/abs/2204.05862).
### Experiment result ### Experiment result
Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862): Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
<div align=middle> <img width="512" alt="image" src="https://user-images.githubusercontent.com/70618399/225263321-8d64c3a8-6877-4cc8-9b61-0e1c52d3d94f.png"> <div align=middle> <img width="512" alt="image" src="https://user-images.githubusercontent.com/70618399/225263321-8d64c3a8-6877-4cc8-9b61-0e1c52d3d94f.png">
...@@ -162,20 +184,20 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862): ...@@ -162,20 +184,20 @@ Model performance in [Anthropics paper](https://arxiv.org/abs/2204.05862):
<div align=left>We also train the reward model based on LLaMA-7B, which reaches the ACC of 72.06% after 1 epoch, performing almost the same as Anthropic's best RM. <div align=left>We also train the reward model based on LLaMA-7B, which reaches the ACC of 72.06% after 1 epoch, performing almost the same as Anthropic's best RM.
### Arg List ### Arg List
- --strategy: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --model: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom' - `--strategy`: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --pretrain: pretrain model, type=str, default=None - `--model`: model type, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
- --model_path: the path of rm model(if continue to train), type=str, default=None - `--pretrain`: pretrain model, type=str, default=None
- --save_path: path to save the model, type=str, default='output' - `--model_path`: the path of rm model(if continue to train), type=str, default=None
- --need_optim_ckpt: whether to save optim ckpt, type=bool, default=False - `--save_path`: path to save the model, type=str, default='output'
- --max_epochs: max epochs for training, type=int, default=3 - `--need_optim_ckpt`: whether to save optim ckpt, type=bool, default=False
- --dataset: dataset name, type=str, choices=['Anthropic/hh-rlhf', 'Dahoas/rm-static'] - `--max_epochs`: max epochs for training, type=int, default=3
- --subset: subset of the dataset, type=str, default=None - `--dataset`: dataset name, type=str, choices=['Anthropic/hh-rlhf', 'Dahoas/rm-static']
- --batch_size: batch size while training, type=int, default=4 - `--subset`: subset of the dataset, type=str, default=None
- --lora_rank: low-rank adaptation matrices rank, type=int, default=0 - `--batch_size`: batch size while training, type=int, default=4
- --loss_func: which kind of loss function, choices=['log_sig', 'log_exp'] - `--lora_rank`: low-rank adaptation matrices rank, type=int, default=0
- --max_len: max sentence length for generation, type=int, default=512 - `--loss_func`: which kind of loss function, choices=['log_sig', 'log_exp']
- --test: whether is only testing, if it's true, the dataset will be small - `--max_len`: max sentence length for generation, type=int, default=512
## Stage3 - Training model using prompts with RL ## Stage3 - Training model using prompts with RL
...@@ -186,53 +208,89 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of ...@@ -186,53 +208,89 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
</p> </p>
You can run the `examples/train_prompts.sh` to start PPO training. You can run the `examples/train_prompts.sh` to start PPO training.
You can also use the cmd following to start PPO training. You can also use the cmd following to start PPO training.
[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g) [[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
``` ```bash
torchrun --standalone --nproc_per_node=4 train_prompts.py \ torchrun --standalone --nproc_per_node=4 train_prompts.py \
--pretrain "/path/to/LLaMa-7B/" \ --pretrain "/path/to/LLaMa-7B/" \
--model 'llama' \ --model 'llama' \
--strategy colossalai_zero2 \ --strategy colossalai_zero2 \
--prompt_dataset /path/to/your/prompt_dataset \ --prompt_dataset /path/to/your/prompt_dataset \
--pretrain_dataset /path/to/your/pretrain_dataset \ --pretrain_dataset /path/to/your/pretrain_dataset \
--rm_pretrain /your/pretrain/rm/definition \ --rm_pretrain /your/pretrain/rm/definition \
--rm_path /your/rm/model/path --rm_path /your/rm/model/path
``` ```
Prompt dataset: the instruction dataset mentioned in the above figure which includes the instructions, e.g. you can use the [script](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/generate_prompt_dataset.py) which samples `instinwild_en.json` or `instinwild_ch.json` in [InstructionWild](https://github.com/XueFuzhao/InstructionWild/tree/main/data#instructwild-data) to generate the prompt dataset. Prompt dataset: the instruction dataset mentioned in the above figure which includes the instructions, e.g. you can use the [script](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/generate_prompt_dataset.py) which samples `instinwild_en.json` or `instinwild_ch.json` in [InstructionWild](https://github.com/XueFuzhao/InstructionWild/tree/main/data#instructwild-data) to generate the prompt dataset.
Pretrain dataset: the pretrain dataset including the instruction and corresponding response, e.g. you can use the [InstructWild Data](https://github.com/XueFuzhao/InstructionWild/tree/main/data) in stage 1 supervised instructs tuning. Pretrain dataset: the pretrain dataset including the instruction and corresponding response, e.g. you can use the [InstructWild Data](https://github.com/XueFuzhao/InstructionWild/tree/main/data) in stage 1 supervised instructs tuning.
**Note**: the required datasets follow the following format,
- `pretrain dataset`
```json
[
{
"instruction": "Provide a list of the top 10 most popular mobile games in Asia",
"input": "",
"output": "The top 10 most popular mobile games in Asia are:\n1) PUBG Mobile\n2) Pokemon Go\n3) Candy Crush Saga\n4) Free Fire\n5) Clash of Clans\n6) Mario Kart Tour\n7) Arena of Valor\n8) Fantasy Westward Journey\n9) Subway Surfers\n10) ARK Survival Evolved",
"id": 0
},
...
]
```
- `prompt dataset`
```json
[
{
"instruction": "Edit this paragraph to make it more concise: \"Yesterday, I went to the store and bought some things. Then, I came home and put them away. After that, I went for a walk and met some friends.\"",
"id": 0
},
{
"instruction": "Write a descriptive paragraph about a memorable vacation you went on",
"id": 1
},
...
]
```
### Arg List ### Arg List
- --strategy: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --model: model type of actor, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom' - `--strategy`: the strategy using for training, choices=['ddp', 'colossalai_gemini', 'colossalai_zero2'], default='colossalai_zero2'
- --pretrain: pretrain model, type=str, default=None - `--model`: model type of actor, choices=['gpt2', 'bloom', 'opt', 'llama'], default='bloom'
- --rm_model: reward model type, type=str, choices=['gpt2', 'bloom', 'opt', 'llama'], default=None - `--pretrain`: pretrain model, type=str, default=None
- --rm_pretrain: pretrain model for reward model, type=str, default=None - `--rm_model`: reward model type, type=str, choices=['gpt2', 'bloom', 'opt', 'llama'], default=None
- --rm_path: the path of rm model, type=str, default=None - `--rm_pretrain`: pretrain model for reward model, type=str, default=None
- --save_path: path to save the model, type=str, default='output' - `--rm_path`: the path of rm model, type=str, default=None
- --prompt_dataset: path of the prompt dataset, type=str, default=None - `--save_path`: path to save the model, type=str, default='output'
- --pretrain_dataset: path of the ptx dataset, type=str, default=None - `--prompt_dataset`: path of the prompt dataset, type=str, default=None
- --need_optim_ckpt: whether to save optim ckpt, type=bool, default=False - `--pretrain_dataset`: path of the ptx dataset, type=str, default=None
- --num_episodes: num of episodes for training, type=int, default=10 - `--need_optim_ckpt`: whether to save optim ckpt, type=bool, default=False
- --num_update_steps: number of steps to update policy per episode, type=int - `--num_episodes`: num of episodes for training, type=int, default=10
- --num_collect_steps: number of steps to collect experience per episode, type=int - `--num_update_steps`: number of steps to update policy per episode, type=int
- --train_batch_size: batch size while training, type=int, default=8 - `--num_collect_steps`: number of steps to collect experience per episode, type=int
- --ptx_batch_size: batch size to compute ptx loss, type=int, default=1 - `--train_batch_size`: batch size while training, type=int, default=8
- --experience_batch_size: batch size to make experience, type=int, default=8 - `--ptx_batch_size`: batch size to compute ptx loss, type=int, default=1
- --lora_rank: low-rank adaptation matrices rank, type=int, default=0 - `--experience_batch_size`: batch size to make experience, type=int, default=8
- --kl_coef: kl_coef using for computing reward, type=float, default=0.1 - `--lora_rank`: low-rank adaptation matrices rank, type=int, default=0
- --ptx_coef: ptx_coef using for computing policy loss, type=float, default=0.9 - `--kl_coef`: kl_coef using for computing reward, type=float, default=0.1
- `--ptx_coef`: ptx_coef using for computing policy loss, type=float, default=0.9
## Inference example - After Stage3 ## Inference example - After Stage3
We support different inference options, including int8 and int4 quantization. We support different inference options, including int8 and int4 quantization.
For details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference). For details, see [`inference/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/inference).
## Attention ## Attention
The examples are demos for the whole training process.You need to change the hyper-parameters to reach great performance. The examples are demos for the whole training process.You need to change the hyper-parameters to reach great performance.
#### data #### data
- [x] [rm-static](https://huggingface.co/datasets/Dahoas/rm-static) - [x] [rm-static](https://huggingface.co/datasets/Dahoas/rm-static)
- [x] [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [x] [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [ ] [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) - [ ] [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
...@@ -242,14 +300,16 @@ The examples are demos for the whole training process.You need to change the hyp ...@@ -242,14 +300,16 @@ The examples are demos for the whole training process.You need to change the hyp
## Support Model ## Support Model
### GPT ### GPT
- [x] GPT2-S (s)
- [x] GPT2-M (m) - [x] GPT2-S (s)
- [x] GPT2-L (l) - [x] GPT2-M (m)
- [x] GPT2-XL (xl) - [x] GPT2-L (l)
- [x] GPT2-4B (4b) - [x] GPT2-XL (xl)
- [ ] GPT2-6B (6b) - [x] GPT2-4B (4b)
- [ ] GPT2-6B (6b)
### BLOOM ### BLOOM
- [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m) - [x] [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m)
- [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1) - [x] [BLOOM-1b1](https://huggingface.co/bigscience/bloom-1b1)
- [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b) - [x] [BLOOM-3b](https://huggingface.co/bigscience/bloom-3b)
...@@ -257,6 +317,7 @@ The examples are demos for the whole training process.You need to change the hyp ...@@ -257,6 +317,7 @@ The examples are demos for the whole training process.You need to change the hyp
- [ ] [BLOOM-175b](https://huggingface.co/bigscience/bloom) - [ ] [BLOOM-175b](https://huggingface.co/bigscience/bloom)
### OPT ### OPT
- [x] [OPT-125M](https://huggingface.co/facebook/opt-125m) - [x] [OPT-125M](https://huggingface.co/facebook/opt-125m)
- [x] [OPT-350M](https://huggingface.co/facebook/opt-350m) - [x] [OPT-350M](https://huggingface.co/facebook/opt-350m)
- [x] [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b) - [x] [OPT-1.3B](https://huggingface.co/facebook/opt-1.3b)
...@@ -266,10 +327,11 @@ The examples are demos for the whole training process.You need to change the hyp ...@@ -266,10 +327,11 @@ The examples are demos for the whole training process.You need to change the hyp
- [ ] [OPT-30B](https://huggingface.co/facebook/opt-30b) - [ ] [OPT-30B](https://huggingface.co/facebook/opt-30b)
### [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) ### [LLaMA](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)
- [x] LLaMA-7B
- [x] LLaMA-13B - [x] LLaMA-7B
- [ ] LLaMA-33B - [x] LLaMA-13B
- [ ] LLaMA-65B - [ ] LLaMA-33B
- [ ] LLaMA-65B
## Add your own models ## Add your own models
...@@ -282,12 +344,12 @@ if it is supported in huggingface [transformers](https://github.com/huggingface/ ...@@ -282,12 +344,12 @@ if it is supported in huggingface [transformers](https://github.com/huggingface/
r you can build your own model by yourself. r you can build your own model by yourself.
### Actor model ### Actor model
```
```python
from ..base import Actor from ..base import Actor
from transformers.models.coati import CoatiModel from transformers.models.coati import CoatiModel
class CoatiActor(Actor): class CoatiActor(Actor):
def __init__(self, def __init__(self,
pretrained: Optional[str] = None, pretrained: Optional[str] = None,
checkpoint: bool = False, checkpoint: bool = False,
...@@ -302,7 +364,8 @@ class CoatiActor(Actor): ...@@ -302,7 +364,8 @@ class CoatiActor(Actor):
``` ```
### Reward model ### Reward model
```
```python
from ..base import RewardModel from ..base import RewardModel
from transformers.models.coati import CoatiModel from transformers.models.coati import CoatiModel
...@@ -325,12 +388,11 @@ class CoatiRM(RewardModel): ...@@ -325,12 +388,11 @@ class CoatiRM(RewardModel):
### Critic model ### Critic model
``` ```python
from ..base import Critic from ..base import Critic
from transformers.models.coati import CoatiModel from transformers.models.coati import CoatiModel
class CoatiCritic(Critic): class CoatiCritic(Critic):
def __init__(self, def __init__(self,
pretrained: Optional[str] = None, pretrained: Optional[str] = None,
checkpoint: bool = False, checkpoint: bool = False,
......
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**
# Community Examples # Community Examples
--- ---
We are thrilled to announce the latest updates to ColossalChat, an open-source solution for cloning ChatGPT with a complete RLHF (Reinforcement Learning with Human Feedback) pipeline. We are thrilled to announce the latest updates to ColossalChat, an open-source solution for cloning ChatGPT with a complete RLHF (Reinforcement Learning with Human Feedback) pipeline.
As Colossal-AI undergoes major updates, we are actively maintaining ColossalChat to stay aligned with the project's progress. With the introduction of Community-driven example, we aim to create a collaborative platform for developers to contribute exotic features built on top of ColossalChat. As Colossal-AI undergoes major updates, we are actively maintaining ColossalChat to stay aligned with the project's progress. With the introduction of Community-driven example, we aim to create a collaborative platform for developers to contribute exotic features built on top of ColossalChat.
...@@ -14,11 +18,12 @@ For more information about community pipelines, please have a look at this [issu ...@@ -14,11 +18,12 @@ For more information about community pipelines, please have a look at this [issu
Community examples consist of both inference and training examples that have been added by the community. Please have a look at the following table to get an overview of all community examples. Click on the Code Example to get a copy-and-paste ready code example that you can try out. If a community doesn't work as expected, please open an issue and ping the author on it. Community examples consist of both inference and training examples that have been added by the community. Please have a look at the following table to get an overview of all community examples. Click on the Code Example to get a copy-and-paste ready code example that you can try out. If a community doesn't work as expected, please open an issue and ping the author on it.
| Example | Description | Code Example | Colab | Author | | Example | Description | Code Example | Colab | Author |
|:---------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------:| | :------------------- | :----------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------- | :---- | ------------------------------------------------: |
| Peft | Adding Peft support for SFT and Prompts model training | [Huggingface Peft](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/community/peft) | - | [YY Lin](https://github.com/yynil) | | Peft | Adding Peft support for SFT and Prompts model training | [Huggingface Peft](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/community/peft) | - | [YY Lin](https://github.com/yynil) |
| Train prompts on Ray | A Ray based implementation of Train prompts example | [Training On Ray](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/community/ray) | - | [MisterLin1995](https://github.com/MisterLin1995) | | Train prompts on Ray | A Ray based implementation of Train prompts example | [Training On Ray](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/community/ray) | - | [MisterLin1995](https://github.com/MisterLin1995) |
|...|...|...|...|...| | ... | ... | ... | ... | ... |
### How to get involved ### How to get involved
To join our community-driven initiative, please visit the [ColossalChat GitHub repository](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples), review the provided information, and explore the codebase. To contribute, create a new issue outlining your proposed feature or enhancement, and our team will review and provide feedback. We look forward to collaborating with you on this exciting project! To join our community-driven initiative, please visit the [ColossalChat GitHub repository](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples), review the provided information, and explore the codebase. To contribute, create a new issue outlining your proposed feature or enhancement, and our team will review and provide feedback. We look forward to collaborating with you on this exciting project!
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**
# Add Peft support for SFT and Prompts model training # Add Peft support for SFT and Prompts model training
The original implementation just adopts the loralib and merges the layers into the final model. The huggingface peft is a better lora model implementation and can be easily training and distributed. The original implementation just adopts the loralib and merges the layers into the final model. The huggingface peft is a better lora model implementation and can be easily training and distributed.
...@@ -5,7 +7,9 @@ The original implementation just adopts the loralib and merges the layers into t ...@@ -5,7 +7,9 @@ The original implementation just adopts the loralib and merges the layers into t
Since reward model is relative small, I just keep it as original one. I suggest train full model to get the proper reward/critic model. Since reward model is relative small, I just keep it as original one. I suggest train full model to get the proper reward/critic model.
# Preliminary installation # Preliminary installation
Since the current pypi peft package(0.2) has some bugs, please install the peft package using source. Since the current pypi peft package(0.2) has some bugs, please install the peft package using source.
``` ```
git clone https://github.com/huggingface/peft git clone https://github.com/huggingface/peft
cd peft cd peft
...@@ -13,6 +17,7 @@ pip install . ...@@ -13,6 +17,7 @@ pip install .
``` ```
# Usage # Usage
For SFT training, just call train_peft_sft.py For SFT training, just call train_peft_sft.py
Its arguments are almost identical to train_sft.py instead adding a new eval_dataset if you have a eval_dataset file. The data file is just a plain datafile, please check the format in the easy_dataset.py. Its arguments are almost identical to train_sft.py instead adding a new eval_dataset if you have a eval_dataset file. The data file is just a plain datafile, please check the format in the easy_dataset.py.
...@@ -21,4 +26,5 @@ For stage-3 rlhf training, call train_peft_prompts.py. ...@@ -21,4 +26,5 @@ For stage-3 rlhf training, call train_peft_prompts.py.
Its arguments are almost identical to train_prompts.py. The only difference is that I use text files to indicate the prompt and pretrained data file. The models are included in easy_models.py. Currently only bloom models are tested, but technically gpt2/opt/llama should be supported. Its arguments are almost identical to train_prompts.py. The only difference is that I use text files to indicate the prompt and pretrained data file. The models are included in easy_models.py. Currently only bloom models are tested, but technically gpt2/opt/llama should be supported.
# Dataformat # Dataformat
Please refer the formats in test_sft.txt, test_prompts.txt, test_pretrained.txt. Please refer the formats in test_sft.txt, test_prompts.txt, test_pretrained.txt.
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**
# ColossalAI on Ray # ColossalAI on Ray
## Abstract ## Abstract
This is an experimental effort to run ColossalAI Chat training on Ray This is an experimental effort to run ColossalAI Chat training on Ray
## How to use? ## How to use?
### 1. Setup Ray clusters ### 1. Setup Ray clusters
Please follow the official [Ray cluster setup instructions](https://docs.ray.io/en/latest/cluster/getting-started.html) to setup an cluster with GPU support. Record the cluster's api server endpoint, it should be something similar to http://your.head.node.addrees:8265 Please follow the official [Ray cluster setup instructions](https://docs.ray.io/en/latest/cluster/getting-started.html) to setup an cluster with GPU support. Record the cluster's api server endpoint, it should be something similar to http://your.head.node.addrees:8265
### 2. Clone repo ### 2. Clone repo
Clone this project: Clone this project:
```shell ```shell
git clone https://github.com/hpcaitech/ColossalAI.git git clone https://github.com/hpcaitech/ColossalAI.git
``` ```
### 3. Submit the ray job ### 3. Submit the ray job
```shell ```shell
python applications/Chat/examples/community/ray/ray_job_script.py http://your.head.node.addrees:8265 python applications/Chat/examples/community/ray/ray_job_script.py http://your.head.node.addrees:8265
``` ```
### 4. View your job on the Ray Dashboard ### 4. View your job on the Ray Dashboard
Open your ray cluster dashboard http://your.head.node.addrees:8265 to view your submitted training job. Open your ray cluster dashboard http://your.head.node.addrees:8265 to view your submitted training job.
...@@ -20,21 +20,21 @@ Tha data is from [LLaMA Int8 4bit ChatBot Guide v2](https://rentry.org/llama-tar ...@@ -20,21 +20,21 @@ Tha data is from [LLaMA Int8 4bit ChatBot Guide v2](https://rentry.org/llama-tar
### 8-bit ### 8-bit
| Model | Min GPU RAM | Recommended GPU RAM | Min RAM/Swap | Card examples | | Model | Min GPU RAM | Recommended GPU RAM | Min RAM/Swap | Card examples |
| :---: | :---: | :---: | :---: | :---: | | :-------: | :---------: | :-----------------: | :----------: | :--------------------------------: |
| LLaMA-7B | 9.2GB | 10GB | 24GB | 3060 12GB, RTX 3080 10GB, RTX 3090 | | LLaMA-7B | 9.2GB | 10GB | 24GB | 3060 12GB, RTX 3080 10GB, RTX 3090 |
| LLaMA-13B | 16.3GB | 20GB | 32GB | RTX 3090 Ti, RTX 4090 | | LLaMA-13B | 16.3GB | 20GB | 32GB | RTX 3090 Ti, RTX 4090 |
| LLaMA-30B | 36GB | 40GB | 64GB | A6000 48GB, A100 40GB | | LLaMA-30B | 36GB | 40GB | 64GB | A6000 48GB, A100 40GB |
| LLaMA-65B | 74GB | 80GB | 128GB | A100 80GB | | LLaMA-65B | 74GB | 80GB | 128GB | A100 80GB |
### 4-bit ### 4-bit
| Model | Min GPU RAM | Recommended GPU RAM | Min RAM/Swap | Card examples | | Model | Min GPU RAM | Recommended GPU RAM | Min RAM/Swap | Card examples |
| :---: | :---: | :---: | :---: | :---: | | :-------: | :---------: | :-----------------: | :----------: | :--------------------------------------------------------: |
| LLaMA-7B | 3.5GB | 6GB | 16GB | RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060 | | LLaMA-7B | 3.5GB | 6GB | 16GB | RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060 |
| LLaMA-13B | 6.5GB | 10GB | 32GB | AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000 | | LLaMA-13B | 6.5GB | 10GB | 32GB | AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000 |
| LLaMA-30B | 15.8GB | 20GB | 64GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | | LLaMA-30B | 15.8GB | 20GB | 64GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 |
| LLaMA-65B | 31.2GB | 40GB | 128GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada | | LLaMA-65B | 31.2GB | 40GB | 128GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada |
## General setup ## General setup
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment