README.md 11.8 KB
Newer Older
Rayyyyy's avatar
Rayyyyy committed
1
2
3
4
5
# llama3
## 论文
[llama3](https://llama.meta.com/llama3/)

## 模型结构
Rayyyyy's avatar
Rayyyyy committed
6
7
8
9
10
Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Llama-2相比,做了几个关键的改进:
- 基于超过15T token训练数据,大小相当于Llama 2数据集的7倍还多,增强了推理、代码生成和指令跟随等方面的能力;
- 支持8K长文本(之前是4k),改进的tokenizer具有128K tokens的词汇量,可以更有效地对语言进行编码,从而大大提高了模型的性能;
- 采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。
- 在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
Rayyyyy's avatar
Rayyyyy committed
11

Rayyyyy's avatar
Rayyyyy committed
12
13
14
15
16
## 算法原理
<div align=center>
    <img src="./doc/method.png"/>
</div>

Rayyyyy's avatar
Rayyyyy committed
17
18
19

## 环境配置
-v 路径、docker_name和imageID根据实际情况修改
Rayyyyy's avatar
Rayyyyy committed
20
**注意**:bitsandbytes库功能不全,暂不支持量化相关
Rayyyyy's avatar
Rayyyyy committed
21
22
23

### Docker(方法一)
```bash
Rayyyyy's avatar
Rayyyyy committed
24
25
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
26
27
28
29
30
31
32

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Dockerfile(方法二)
```bash
Rayyyyy's avatar
Rayyyyy committed
33
cd docker
Rayyyyy's avatar
Rayyyyy committed
34
docker build --no-cache -t llama3:latest .
Rayyyyy's avatar
Rayyyyy committed
35
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
36
37
38
39
40
41
42
43

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash
Rayyyyy's avatar
Rayyyyy committed
44
DTK驱动: dtk24.04
Rayyyyy's avatar
Rayyyyy committed
45
python: python3.10
Rayyyyy's avatar
Rayyyyy committed
46
torch: 2.1.0
Rayyyyy's avatar
Rayyyyy committed
47
xtuner: 0.1.18
Rayyyyy's avatar
Rayyyyy committed
48
```
Rayyyyy's avatar
Rayyyyy committed
49
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
Rayyyyy's avatar
Rayyyyy committed
50
51
52
53
54
55
56

其它非深度学习库安装方式如下:
```bash
pip install -e .
```

## 数据集
Rayyyyy's avatar
Rayyyyy committed
57
58
59
60
61
62
```
├── llama3_pytorch
│   ├── datasets
│       ├── alpaca_data.json
│       └── multi_turn_dataset_2.json
```
Rayyyyy's avatar
Rayyyyy committed
63
64

## 训练
Rayyyyy's avatar
Rayyyyy committed
65
### xtuner微调方法
Rayyyyy's avatar
Rayyyyy committed
66
1. 训练库安装(非llama3_pytorch目录下),请注意所需库版本
Rayyyyy's avatar
Rayyyyy committed
67
```bash
Rayyyyy's avatar
Rayyyyy committed
68
69
pip uninstall flash-attn # 2.0.4+82379d7.abi0.dtk2404.torch2.1
# docker环境含有deepspeed的可不进行安装, 需要对照版本是否一致即可
Rayyyyy's avatar
Rayyyyy committed
70
pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
Rayyyyy's avatar
Rayyyyy committed
71
72
73
git clone -b v0.1.18 https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
Rayyyyy's avatar
Rayyyyy committed
74
pip install mmengine==0.10.3
Rayyyyy's avatar
Rayyyyy committed
75
76
# 注意bitsandbytes库版本,如果环境中一致可不安装,否则需要重新安装
pip install bitsandbytes-0.37.0+das1.0+gitd3d888f.abi0.dtk2404.torch2.1-py3-none-any.whl
Rayyyyy's avatar
Rayyyyy committed
77
```
Rayyyyy's avatar
Rayyyyy committed
78
79
80

2. 通过[预训练权重](#预训练权重)下载预训练模型,当前用例使用[Meta-Llama-3-8B-Instruct](http://113.200.138.88:18080/aimodels/Meta-Llama-3-8B-Instruct)模型;

Rayyyyy's avatar
Rayyyyy committed
81
3. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path``data_path`为本地模型、数据地址;
Rayyyyy's avatar
Rayyyyy committed
82

Rayyyyy's avatar
Rayyyyy committed
83
4. 根据硬件环境和自身训练需求来调整`max_length``batch_size``accumulative_counts``max_epochs``lr``save_steps``evaluation_freq`、model.lora中的`r``lora_alpha`参数,默认参数支持4*32G;
Rayyyyy's avatar
Rayyyyy committed
84

Rayyyyy's avatar
Rayyyyy committed
85
5. ${DCU_NUM}参数修改为要使用的DCU卡数量,不同数据集需要修改llama3_8b_instruct_qlora_alpaca_e3_M.py中`SYSTEM``evaluation_inputs``dataset_map_fn``train_dataloader.sampler``train_cfg`参数设置,详情请参考代码注释项,当前默认alpaca数据集,**`--work-dir`设定保存模型路径**
Rayyyyy's avatar
Rayyyyy committed
86

Rayyyyy's avatar
Rayyyyy committed
87
6. 执行
Rayyyyy's avatar
Rayyyyy committed
88
```bash
Rayyyyy's avatar
Rayyyyy committed
89
90
bash finetune.sh
or
91
NPROC_PER_NODE=${DCU_NUM} xtuner train ./llama3_8b_instruct_qlora_alpaca_e3_M.py --deepspeed deepspeed_zero2 --work-dir /path/of/saves
Rayyyyy's avatar
Rayyyyy committed
92
```
Rayyyyy's avatar
Rayyyyy committed
93
94

## 推理
Rayyyyy's avatar
Rayyyyy committed
95
96
预训练模型下载
请参考下面的[预训练权重](#预训练权重)章节,不同的模型需要不同的模型并行(MP)值,如下表所示:
Rayyyyy's avatar
Rayyyyy committed
97

Rayyyyy's avatar
Rayyyyy committed
98
99
100
|  Model | MP |
|--------|----|
| 8B     | 1  |
Rayyyyy's avatar
Rayyyyy committed
101
| 70B    | 8  |
Rayyyyy's avatar
Rayyyyy committed
102
103
104
105

所有模型都支持序列长度高达8192个tokens,但我们根据max_seq_len和max_batch_size值预先分配缓存。根据你的硬件设置。

**Tips:**
Rayyyyy's avatar
Rayyyyy committed
106
- `–nproc_per_node`需要根据模型的MP值进行设置(参考上表)。
Rayyyyy's avatar
Rayyyyy committed
107
108
109
- `max_seq_len``max_batch_size`参数按需设置。

### Pretrained模型
Rayyyyy's avatar
Rayyyyy committed
110
这些模型都没有针对聊天或者Q&A进行微调。可以参考`example_text_completion.py`里的用例。
Rayyyyy's avatar
Rayyyyy committed
111

Rayyyyy's avatar
Rayyyyy committed
112
- Meta-Llama-3-8B 模型示例,Meta-Llama-3-70B模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
113
```bash
Rayyyyy's avatar
Rayyyyy committed
114
torchrun --nproc_per_node 1 example_text_completion.py \
Rayyyyy's avatar
Rayyyyy committed
115
    --ckpt_dir Meta-Llama-3-8B/original/ \
Rayyyyy's avatar
Rayyyyy committed
116
117
118
119
    --tokenizer_path Meta-Llama-3-8B/original/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
```

Rayyyyy's avatar
Rayyyyy committed
120
### Instruction-tuned模型
Rayyyyy's avatar
Rayyyyy committed
121
经过微调的模型被训练用于对话应用程序。为了获得模型的预期特性和性能,需要遵循 [`ChatFormat`](llama/tokenizer.py#L202)中定义的特定格式:
Rayyyyy's avatar
Rayyyyy committed
122
123
- 提示以特殊令牌`<|begin_of_text|>`开始,之后跟随一个或多个消息。
- 每条消息以标签`<|start_header_id|>`开始,角色为`system``user`或者`assistant`、并以标签`<|end_header_id|>`结束。
Rayyyyy's avatar
Rayyyyy committed
124
125
- 在双换行符`\n\n`之后,消息的内容随之而来。
- 每条消息的结尾由`<|eot_id|>`令牌标记。
Rayyyyy's avatar
Rayyyyy committed
126
127
128

您还可以部署额外的分类器来过滤被认为不安全的输入和输出。有关如何向推理代码的输入和输出添加安全检查器,请参阅[llama-recipes repo](https://github.com/meta-llama/llama-recipes/blob/main/recipes/inference/local_inference/inference.py)

Rayyyyy's avatar
Rayyyyy committed
129
- Meta-Llama-3-8B-Instruct 模型示例,Meta-Llama-3-70B-Instruct模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
130
131
132
133
134
135
```bash
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/original/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/original/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6
```
Rayyyyy's avatar
Rayyyyy committed
136

Rayyyyy's avatar
Rayyyyy committed
137
### 多轮对话
Rayyyyy's avatar
Rayyyyy committed
138
1. 确认环境安装及模型下载完毕;
Rayyyyy's avatar
Rayyyyy committed
139
2. 修改[chat.sh](./chat.sh)文件中的`--ckpt_dir``--tokenizer_path`参数为本地模型地址,`--max_seq_len`根据自身需求进行修改,调整该值可以增加多轮对话模型的记忆长度,不过需要注意的是这可能会增加模型运算的时间和内存需求;
Rayyyyy's avatar
Rayyyyy committed
140
141
142
143
3. 执行:
```bash
bash chat.sh
```
Rayyyyy's avatar
Rayyyyy committed
144

Rayyyyy's avatar
Rayyyyy committed
145
### 验证
Rayyyyy's avatar
Rayyyyy committed
146
1. 安装`llama-recipes``lm-eval`
Rayyyyy's avatar
Rayyyyy committed
147
148
```bash
# llama-recipes 下载
149
git clone http://developer.hpccube.com/codes/chenych/llama-recipes.git
Rayyyyy's avatar
Rayyyyy committed
150
151
cd llama-recipes
# 移动exact_match.py文件
152
mv exact_match.py ~/.cache/huggingface/evaluate/downloads/
Rayyyyy's avatar
Rayyyyy committed
153
cd ../
Rayyyyy's avatar
Rayyyyy committed
154
155
156
157
158
159
# lm-eval 下载
git clone http://developer.hpccube.com/codes/chenych/lm-evaluation-harness.git
cd ./lm-evaluation-harness/
pip install -e .
```

Rayyyyy's avatar
Rayyyyy committed
160
2. 修改待测模型**pretrained**参数地址,例如`/home/Meta-Llama-3-8B-Instruct`,特别地,当前仅支持`hellaswag`数据集进行测试验证。执行以下命令:
Rayyyyy's avatar
Rayyyyy committed
161
```bash
Rayyyyy's avatar
Rayyyyy committed
162
cd /path_of/llama-recipes/recipes/evaluation
Rayyyyy's avatar
Rayyyyy committed
163
# 必须添加HF_ENDPOINT环境变量
164
export HF_ENDPOINT=https://hf-mirror.com
Rayyyyy's avatar
Rayyyyy committed
165
# 执行
Rayyyyy's avatar
Rayyyyy committed
166
167
python eval.py --model hf --model_args pretrained=/home/llama3/Meta-Llama-3-8B-Instruct,dtype="float" --tasks hellaswag --device cuda --batch_size 8
```
Rayyyyy's avatar
Rayyyyy committed
168
169
170
<div align=center>
    <img src="./doc/evaluation.png"/>
</div>
Rayyyyy's avatar
Rayyyyy committed
171

Rayyyyy's avatar
Rayyyyy committed
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
## result
- Meta-Llama-3-8B-Instruct
<div align=center>
    <img src="./doc/Meta-Llama-3-8B-Instruct.png"/>
</div>

- Meta-Llama-3-8B
<div align=center>
    <img src="./doc/Meta-Llama-3-8B.png"/>
</div>

### 精度
暂无

## 应用场景
### 算法类别
对话问答

### 热点应用行业
制造,广媒,家居,教育

## 预训练权重
Rayyyyy's avatar
Rayyyyy committed
194
195
196
197
198
通过[SCNet AIModels](http://113.200.138.88:18080/aimodels)下载预训练模型:
- [Meta-Llama-3-8B](http://113.200.138.88:18080/aimodels/Meta-Llama-3-8B)
- [Meta-Llama-3-8B-Instruct](http://113.200.138.88:18080/aimodels/Meta-Llama-3-8B-Instruct)
- [Meta-Llama-3-70B](http://113.200.138.88:18080/aimodels/Meta-Llama-3-70B)
- [Meta-Llama-3-70B-Instruct](http://113.200.138.88:18080/aimodels/Meta-Llama-3-70B-Instruct)
Rayyyyy's avatar
Rayyyyy committed
199

Rayyyyy's avatar
Rayyyyy committed
200
201
模型目录结构如下:
```bash
Rayyyyy's avatar
Rayyyyy committed
202
├── model_save_path
Rayyyyy's avatar
Rayyyyy committed
203
│   ├── Meta-Llama-3-8B
Rayyyyy's avatar
Rayyyyy committed
204
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
205
206
207
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
208
209
210
211
212
213
214
215
216
217
218
219
220
221
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
222
│   ├── Meta-Llama-3-8B-Instruct
Rayyyyy's avatar
Rayyyyy committed
223
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
224
225
226
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
227
228
229
230
231
232
233
234
235
236
237
238
239
240
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
241
│   ├── Meta-Llama-3-70B
Rayyyyy's avatar
Rayyyyy committed
242
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
243
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
244
│           ...
Rayyyyy's avatar
Rayyyyy committed
245
246
247
│           ├── consolidated.07.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
263
│   └── Meta-Llama-3-70B-Instruct
Rayyyyy's avatar
Rayyyyy committed
264
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
265
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
266
│           ...
Rayyyyy's avatar
Rayyyyy committed
267
│           ├── consolidated.07.pth
Rayyyyy's avatar
Rayyyyy committed
268
269
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
285
286
```

Rayyyyy's avatar
Rayyyyy committed
287
288
289
290
291
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/llama3_pytorch

## 参考资料
- https://github.com/meta-llama/llama3
Rayyyyy's avatar
Rayyyyy committed
292
- https://github.com/InternLM/xtuner
Rayyyyy's avatar
Rayyyyy committed
293
- https://github.com/meta-llama/llama-recipes