README.md 1.39 KB
Newer Older
1
2
:warning: **This content may be outdated since the major update of Colossal Chat. We will update this content soon.**

3
4
# Add Peft support for SFT and Prompts model training

5
The original implementation just adopts the loralib and merges the layers into the final model. The huggingface peft is a better lora model implementation and can be easily training and distributed.
6
7
8

Since reward model is relative small, I just keep it as original one. I suggest train full model to get the proper reward/critic model.

9
# Preliminary installation
10

11
Since the current pypi peft package(0.2) has some bugs, please install the peft package using source.
12

13
14
15
16
```
git clone https://github.com/huggingface/peft
cd peft
pip install .
17
```
18
19

# Usage
20

21
22
For SFT training, just call train_peft_sft.py

23
Its arguments are almost identical to train_sft.py instead adding a new eval_dataset if you have an eval_dataset file. The data file is just a plain datafile, please check the format in the easy_dataset.py.
24
25

For stage-3 rlhf training, call train_peft_prompts.py.
26
Its arguments are almost identical to train_prompts.py. The only difference is that I use text files to indicate the prompt and pretrained data file. The models are included in easy_models.py. Currently only bloom models are tested, but technically gpt2/opt/llama should be supported.
27
28

# Dataformat
29

30
Please refer the formats in test_sft.txt, test_prompts.txt, test_pretrained.txt.