README.md 8.88 KB
Newer Older
zhaoying1's avatar
zhaoying1 committed
1
2
# Baichuan

zhaoying1's avatar
update  
zhaoying1 committed
3
4
## 论文

zhaoying1's avatar
zhaoying1 committed
5
6
7
8
9
10
11
12
13
14
15


## 模型结构
Baichuan系列模型是由百川智能开发的开源大规模预训练模型,包含7B和13B等规模。其中,Baichuan-7B在大约1.2万亿tokens上训练的70亿参数模型,支持中英双语,上下文窗口长度为4096。Baichuan-13B是由百川智能继Baichuan-7B之后开发的包含130亿参数模型,它在高质量的语料上训练了1.4万亿tokens,超过LLaMA-13B 40%,是当前开源 13B 尺寸下训练数据量最多的模型。此外,百川智能还发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力。

模型具体参数:

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长 |
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | RoPE | 4096 |
| Baichuan-13B | 5,120 | 40 | 	40 | 64,000 | 13,264,901,120 | 1.4万亿 | ALiBi | 4096 |
qianyj's avatar
qianyj committed
16
17
18
<div align="center">
<img src="./assets/transformer.jpg" width="400" height="300">
</div>
zhaoying1's avatar
zhaoying1 committed
19
20
21

## 算法原理
Baichuan整体模型基于标准的Transformer结构,采用了和LLaMA一样的模型设计。其中,Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术,相对于Rotary Embedding计算量更小,对推理性能有显著提升。
qianyj's avatar
qianyj committed
22
23
24
<div align="center">
<img src="./assets/transformer.png" width="450" height="300">
</div>
zhaoying1's avatar
zhaoying1 committed
25
26

## 环境配置
qianyj's avatar
qianyj committed
27
28


zhaoying1's avatar
zhaoying1 committed
29
30
31
### Docker(方式一)
推荐使用docker方式运行,提供拉取的docker镜像:
```
dcuai's avatar
dcuai committed
32
33
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -dit --network=host --name=baichuan -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py38-latest /bin/bash
zhaoying1's avatar
zhaoying1 committed
34
docker exec -it baichuan /bin/bash
zhaoying1's avatar
zhaoying1 committed
35
36
37
```
安装docker中没有的依赖:
```
zhaoying1's avatar
update  
zhaoying1 committed
38
pip install -r requirements.txt  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
zhaoying1's avatar
zhaoying1 committed
39
40
41
42
43
44

```

#### Dockerfile(方式二)
```
docker build -t baichuan:latest .
dcuai's avatar
dcuai committed
45
docker run -dit --network=host -v /opt/hyhal:/opt/hyhal:ro --name=baichuan --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 baichuan:latest
zhaoying1's avatar
zhaoying1 committed
46
47
48
49
50
51
docker exec -it baichuan /bin/bash
```

### Conda(方式三)
1. 创建conda虚拟环境:
```
dcuai's avatar
dcuai committed
52
conda create -n chatglm python=3.10
zhaoying1's avatar
zhaoying1 committed
53
54
55
```

2. 关于本项目DCU显卡所需的工具包、深度学习库等均可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
dcuai's avatar
dcuai committed
56
57
58
- [DTK 24.04.1](https://cancon.hpccube.com:65024/1/main/DTK-24.04.1)
- [Pytorch 2.1.0](https://cancon.hpccube.com:65024/directlink/4/pytorch/DAS1.1/torch-2.1.0+das1.1.git3ac1bdd.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl)
- [Deepspeed 0.12.3](https://cancon.hpccube.com:65024/directlink/4/deepspeed/DAS1.1/deepspeed-0.12.3+gita724046.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl)
zhaoying1's avatar
zhaoying1 committed
59
60
61
62
63

    Tips:以上dtk驱动、python、deepspeed等工具版本需要严格一一对应。

3. 其它依赖库参照requirements.txt安装:
```
zhaoying1's avatar
update  
zhaoying1 committed
64
pip install -r requirements.txt  -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com 
zhaoying1's avatar
zhaoying1 committed
65
66
```

zhaoying1's avatar
update  
zhaoying1 committed
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
### 注意
由于dtk版本的deepspeed目前最高是0.9.2因此需要进入虚拟环境修改一些版本判断

```
#到虚拟环境下对应的python/site-packages注释掉一些版本判断
site-packages/accelerate/accelerator.py 文件

 287             #if not is_deepspeed_available():
 288             #    raise ImportError("DeepSpeed is not installed => run `pip install deepspeed` or build it from source.")
 289             #if compare_versions("deepspeed", "<", "0.9.3"):
 290             #    raise ImportError("DeepSpeed version must be >= 0.9.3. Please update DeepSpeed.")
 
site-packages/transformers/utils/versions.py 文件
 43     #if not ops[op](version.parse(got_ver), version.parse(want_ver)):
 44     #    raise ImportError(
 45     #        f"{requirement} is required for a normal functioning of this module, but found {pkg}=={got_ver}.{hint}"
 46     #    )
```

zhaoying1's avatar
zhaoying1 committed
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
## 数据集

输入数据为放置在项目[data](.data)目录下的 json 文件,用--dataset选项指定(参考下面示例),多个输入文件用`,`分隔。json 文件示例格式和字段说明如下:
```
[
    {
        "instruction": "What are the three primary colors?",
        "input": "",
        "output": "The three primary colors are red, blue, and yellow."
    },
    ....
]
```
json 文件中存储一个列表,列表的每个元素是一个sample。其中instruction代表用户输入,input是可选项,如果开发者同时指定了instruction和input,会把二者用\n连接起来代表用户输入;output代表期望的模型输出。本仓库的[data](./data)目录下预置了一些可用于Baichuan模型指令微调训练的公开数据集:
```
./data/alpaca_gpt4_data_zh.json
./data/alpaca_gpt4_data_en.json
./data/alpaca_data_zh_51k.json
./data/alpaca_data_en_52k.json
./data/oaast_sft_zh.json
./data/oaast_sft.json
...
```
数据集的使用方法请参考 [data/README.md](data/README_zh.md) 文件。

chenzk's avatar
chenzk committed
111
注意:请配置[./src/llmtunerhparams/data_args.py](src/llmtuner/hparams/data_args.py)中L38的dataset_dir路径;
112

zhaoying1's avatar
zhaoying1 committed
113
### 模型下载
zhaoying1's avatar
zhaoying1 committed
114
115
116
117
118
119
120
121
122
Hugging Face模型下载地址:

[Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)

[Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)

[Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)


qianyj's avatar
qianyj committed
123
124
## 训练
### 全参数微调训练
zhaoying1's avatar
zhaoying1 committed
125

qianyj's avatar
qianyj committed
126
1. 单机训练
zhaoying1's avatar
zhaoying1 committed
127
128
129
130
131
132
```
bash run-full.sh
```
您可以根据自己的需求,更改其中的batch size大小、模型路径、数据集及deepspeed配置文件等。
注意:以上实例中加载的预训练模型为对齐模型,所以--template 参数被设置为`baichuan`; 若您加载的预训练模型为基座(base)模型,请设置为`--template default`

qianyj's avatar
qianyj committed
133
2. 多机训练
zhaoying1's avatar
zhaoying1 committed
134
```
qianyj's avatar
qianyj committed
135
cd multi_node
zhaoying1's avatar
zhaoying1 committed
136
``` 
qianyj's avatar
qianyj committed
137
138
139
140
141
进入节点1,根据环境修改hostfile,保证两节点文件路径一致,配置相同,按需修改run-13b-sft.sh中--mca btl_tcp_if_include enp97s0f1,enp97s0f1改为ip a命令后对应节点ip的网卡名,numa可以根据当前节点拓扑更改绑定,微调命令:
``` 
bash run-13b-sft.sh
``` 
### LoRA微调训练
zhaoying1's avatar
zhaoying1 committed
142

qianyj's avatar
qianyj committed
143
1. 单机训练
zhaoying1's avatar
zhaoying1 committed
144
145
146
147
148
149
```
bash run-lora.sh
```
您可以根据自己的需求,更改其中的batch size大小、模型路径、数据集、deepspeed配置文件、lora_ran及lora_target等。请使用 python src/train_bash.py -h 查看全部可选项。


qianyj's avatar
qianyj committed
150
2. 多机训练
zhaoying1's avatar
zhaoying1 committed
151
```
qianyj's avatar
qianyj committed
152
153
154
155
156
157
158
cd multi_node
``` 
进入节点1,根据环境修改hostfile,保证两节点文件路径一致,配置相同,按需修改run-13b-sft.sh中--mca btl_tcp_if_include enp97s0f1,enp97s0f1改为ip a命令后对应节点ip的网卡名,numa可以根据当前节点拓扑更改绑定,微调命令:
``` 
bash run-7b-sft-lora.sh
``` 

zhaoying1's avatar
zhaoying1 committed
159
160
161
162
163
164
165
166
167
168
169
170
171


## 推理

### 命令行测试

```bash
python src/cli_demo.py \
    --model_name_or_path path_to_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint
```
zhaoying1's avatar
update  
zhaoying1 committed
172
注意:对于所有“基座”(Base)模型,--template 参数可以是 default 或者 baichuan任意值。但“对话”(Chat)模型请务必使用baichuan。
zhaoying1's avatar
zhaoying1 committed
173
174
175


## Result
zhaoying1's avatar
update  
zhaoying1 committed
176
177

<div align="center">
zhaoying1's avatar
update  
zhaoying1 committed
178
<img src="./assets/baichuan-result.png" width="650" height="150">
zhaoying1's avatar
update  
zhaoying1 committed
179
</div>
qianyj's avatar
qianyj committed
180
181

## 精度
zhaoying1's avatar
zhaoying1 committed
182
183
184
185
186
187
188
189
190
191
192
193
194
195
- 以下为我们基于baichuan-13b-base模型进行全参数指令微调测试的loss收敛情况:
<div align="center">
<img src="./assets/training_loss.png" width="300" height="250">
</div>

- 以下为我们基于baichuan-7b-base模型进行LoRA指令微调测试的loss收敛情况: 
<div align="center">
<img src="./assets/training_loss_lora.png" width="300" height="250">
</div>

## 应用场景

### 算法类别

zhaoying1's avatar
zhaoying1 committed
196
`对话问答`
zhaoying1's avatar
zhaoying1 committed
197
198
199

### 热点应用行业

zhaoying1's avatar
zhaoying1 committed
200
`医疗,教育,科研,金融`
zhaoying1's avatar
zhaoying1 committed
201

chenzk's avatar
chenzk committed
202
203
204
205
## 预训练权重
预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels) ,项目中的预训练权重可从快速下载通道下载:
[Baichuan-7B](http://113.200.138.88:18080/aimodels/Baichuan-7B)[Baichuan-13B-Base](http://113.200.138.88:18080/aimodels/Baichuan-13B-Base)[Baichuan-13B-Chat](http://113.200.138.88:18080/aimodels/Baichuan-13B-Chat)

zhaoying1's avatar
zhaoying1 committed
206
207
208
209
## 源码仓库及问题反馈

- https://developer.hpccube.com/codes/modelzoo/baichuan-pytorch

zhaoying1's avatar
zhaoying1 committed
210
## 参考资料
zhaoying1's avatar
zhaoying1 committed
211
212

- [https://github.com/hiyouga/LLaMA-Efficient-Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning/)
zhaoying1's avatar
zhaoying1 committed
213
- [https://github.com/baichuan-inc/Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)