README.md 14.2 KB
Newer Older
dcuai's avatar
dcuai committed
1
# llama3
Rayyyyy's avatar
Rayyyyy committed
2
3
4
5
## 论文
[llama3](https://llama.meta.com/llama3/)

## 模型结构
Rayyyyy's avatar
Rayyyyy committed
6
7
8
9
10
Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Llama-2相比,做了几个关键的改进:
- 基于超过15T token训练数据,大小相当于Llama 2数据集的7倍还多,增强了推理、代码生成和指令跟随等方面的能力;
- 支持8K长文本(之前是4k),改进的tokenizer具有128K tokens的词汇量,可以更有效地对语言进行编码,从而大大提高了模型的性能;
- 采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。
- 在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
Rayyyyy's avatar
Rayyyyy committed
11

Rayyyyy's avatar
Rayyyyy committed
12
13
14
15
16
## 算法原理
<div align=center>
    <img src="./doc/method.png"/>
</div>

Rayyyyy's avatar
Rayyyyy committed
17
18
## 环境配置
-v 路径、docker_name和imageID根据实际情况修改
Rayyyyy's avatar
Rayyyyy committed
19
**注意**:bitsandbytes库功能不全,暂不支持量化相关
Rayyyyy's avatar
Rayyyyy committed
20
21
22

### Docker(方法一)
```bash
Rayyyyy's avatar
Rayyyyy committed
23
24
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
25
26
27
28
29
30
31

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Dockerfile(方法二)
```bash
Rayyyyy's avatar
Rayyyyy committed
32
cd docker
Rayyyyy's avatar
Rayyyyy committed
33
docker build --no-cache -t llama3:latest .
Rayyyyy's avatar
Rayyyyy committed
34
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
35
36
37
38
39
40

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Anaconda(方法三)
chenzk's avatar
chenzk committed
41
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
Rayyyyy's avatar
Rayyyyy committed
42
```bash
Rayyyyy's avatar
Rayyyyy committed
43
DTK驱动: dtk24.04
Rayyyyy's avatar
Rayyyyy committed
44
python: python3.10
Rayyyyy's avatar
Rayyyyy committed
45
torch: 2.1.0
Rayyyyy's avatar
Rayyyyy committed
46
xtuner: 0.1.18
47
llama-factory: 0.6.3
Rayyyyy's avatar
Rayyyyy committed
48
```
Rayyyyy's avatar
Rayyyyy committed
49
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
Rayyyyy's avatar
Rayyyyy committed
50
51
52
53
54
55
56

其它非深度学习库安装方式如下:
```bash
pip install -e .
```

## 数据集
Rayyyyy's avatar
Rayyyyy committed
57
58
59
60
61
62
```
├── llama3_pytorch
│   ├── datasets
│       ├── alpaca_data.json
│       └── multi_turn_dataset_2.json
```
Rayyyyy's avatar
Rayyyyy committed
63
64

## 训练
Rayyyyy's avatar
Rayyyyy committed
65

Rayyyyy's avatar
Rayyyyy committed
66
### xtuner微调方法
Rayyyyy's avatar
Rayyyyy committed
67
68

1. 训练库安装(**非llama3_pytorch目录下**),安装版本为**v0.1.18**
Rayyyyy's avatar
Rayyyyy committed
69
```bash
Rayyyyy's avatar
Rayyyyy committed
70
71
pip uninstall flash-attn # 2.0.4+82379d7.abi0.dtk2404.torch2.1
# docker环境含有deepspeed的可不进行安装, 需要对照版本是否一致即可
Rayyyyy's avatar
Rayyyyy committed
72
pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
Rayyyyy's avatar
Rayyyyy committed
73
74
75
git clone -b v0.1.18 https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
Rayyyyy's avatar
Rayyyyy committed
76
pip install mmengine==0.10.3
Rayyyyy's avatar
Rayyyyy committed
77
78
# 注意bitsandbytes库版本,如果环境中一致可不安装,否则需要重新安装
pip install bitsandbytes-0.37.0+das1.0+gitd3d888f.abi0.dtk2404.torch2.1-py3-none-any.whl
Rayyyyy's avatar
Rayyyyy committed
79
```
Rayyyyy's avatar
Rayyyyy committed
80

chenzk's avatar
chenzk committed
81
2. 通过[预训练权重](#预训练权重)下载预训练模型,当前用例使用[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)模型;
Rayyyyy's avatar
Rayyyyy committed
82

Rayyyyy's avatar
Rayyyyy committed
83
3. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path``data_path`为本地模型、数据地址;
Rayyyyy's avatar
Rayyyyy committed
84

Rayyyyy's avatar
Rayyyyy committed
85
4. 根据硬件环境和自身训练需求来调整`max_length``batch_size``accumulative_counts``max_epochs``lr``save_steps``evaluation_freq`、model.lora中的`r``lora_alpha`参数,默认参数支持4*32G;
Rayyyyy's avatar
Rayyyyy committed
86

Rayyyyy's avatar
Rayyyyy committed
87
5. ${DCU_NUM}参数修改为要使用的DCU卡数量,不同数据集需要修改llama3_8b_instruct_qlora_alpaca_e3_M.py中`SYSTEM``evaluation_inputs``dataset_map_fn``train_dataloader.sampler``train_cfg`参数设置,详情请参考代码注释项,当前默认alpaca数据集,**`--work-dir`设定保存模型路径**
Rayyyyy's avatar
Rayyyyy committed
88

Rayyyyy's avatar
Rayyyyy committed
89
6. 执行
Rayyyyy's avatar
Rayyyyy committed
90
```bash
Rayyyyy's avatar
Rayyyyy committed
91
92
bash finetune.sh
or
93
NPROC_PER_NODE=${DCU_NUM} xtuner train ./llama3_8b_instruct_qlora_alpaca_e3_M.py --deepspeed deepspeed_zero2 --work-dir /path/of/saves
Rayyyyy's avatar
Rayyyyy committed
94
```
Rayyyyy's avatar
Rayyyyy committed
95

96
### Llama Factory 微调方法(推荐)
Rayyyyy's avatar
Rayyyyy committed
97

98
1. 训练库安装(**非llama3_pytorch目录下**),安装版本为**v0.6.3**`Llama-Factory`具体安装方法请参考仓库的README。
Rayyyyy's avatar
Rayyyyy committed
99
```
chenzk's avatar
chenzk committed
100
git clone -b v0.6.3 http://developer.sourcefind.cn/codes/OpenDAS/llama-factory.git
Rayyyyy's avatar
Rayyyyy committed
101
102
```

chenzk's avatar
chenzk committed
103
2. 通过[预训练权重](#预训练权重)下载预训练模型,当前用例使用[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)模型;
Rayyyyy's avatar
Rayyyyy committed
104

105
3. `llama3/3.1`训练脚本可参考[这里](./llama-factory/examples/),特别地,选择`single_node.sh`启动的,需要确认`single_config.yaml`文件中`num_processes`参数与设置的显卡数量一致。
Rayyyyy's avatar
Rayyyyy committed
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128


#### 全参微调

```bash
cd /your_code_path/llama_factory/examples/full_multi_gpu
```

**参数修改**
--model_name_or_path 修改为待训练模型地址,如 /data/Meta-llama3-models/Meta-Llama-3-8B-Instruct
--dataset 微调训练集名称,可选数据集请参考/LLaMA-Factory-0.6.3/data/dataset_info.json
--template 将 default 修改为 llama3
--output_dir 模型保存地址
--fp16 或 --bf16 开启混合精度,单精度可使用 --pure_bf16
其他参数如:--learning_rate、--save_steps可根据自身硬件及需求进行修改。

#### lora微调

```bash
cd /your_code_path/llama_factory/examples/lora_multi_gpu
```
参数解释同[#全参微调](#全参微调)

Rayyyyy's avatar
Rayyyyy committed
129
## 推理
130
131

请参考下面的[预训练权重](#预训练权重)章节下载预训练模型,不同的模型需要不同的模型并行(MP)值,如下表所示:
Rayyyyy's avatar
Rayyyyy committed
132

Rayyyyy's avatar
Rayyyyy committed
133
134
135
|  Model | MP |
|--------|----|
| 8B     | 1  |
Rayyyyy's avatar
Rayyyyy committed
136
| 70B    | 8  |
Rayyyyy's avatar
Rayyyyy committed
137
138
139
140

所有模型都支持序列长度高达8192个tokens,但我们根据max_seq_len和max_batch_size值预先分配缓存。根据你的硬件设置。

**Tips:**
Rayyyyy's avatar
Rayyyyy committed
141
- `–nproc_per_node`需要根据模型的MP值进行设置(参考上表)。
Rayyyyy's avatar
Rayyyyy committed
142
143
144
- `max_seq_len``max_batch_size`参数按需设置。

### Pretrained模型
145

Rayyyyy's avatar
Rayyyyy committed
146
这些模型都没有针对聊天或者Q&A进行微调。可以参考`example_text_completion.py`里的用例。
Rayyyyy's avatar
Rayyyyy committed
147

Rayyyyy's avatar
Rayyyyy committed
148
- Meta-Llama-3-8B 模型示例,Meta-Llama-3-70B模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
149
```bash
Rayyyyy's avatar
Rayyyyy committed
150
torchrun --nproc_per_node 1 example_text_completion.py \
Rayyyyy's avatar
Rayyyyy committed
151
    --ckpt_dir Meta-Llama-3-8B/original/ \
Rayyyyy's avatar
Rayyyyy committed
152
153
154
155
    --tokenizer_path Meta-Llama-3-8B/original/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
```

Rayyyyy's avatar
Rayyyyy committed
156
### Instruction-tuned模型
157

Rayyyyy's avatar
Rayyyyy committed
158
经过微调的模型被训练用于对话应用程序。为了获得模型的预期特性和性能,需要遵循 [`ChatFormat`](llama/tokenizer.py#L202)中定义的特定格式:
Rayyyyy's avatar
Rayyyyy committed
159
160
- 提示以特殊令牌`<|begin_of_text|>`开始,之后跟随一个或多个消息。
- 每条消息以标签`<|start_header_id|>`开始,角色为`system``user`或者`assistant`、并以标签`<|end_header_id|>`结束。
Rayyyyy's avatar
Rayyyyy committed
161
162
- 在双换行符`\n\n`之后,消息的内容随之而来。
- 每条消息的结尾由`<|eot_id|>`令牌标记。
Rayyyyy's avatar
Rayyyyy committed
163
164
165

您还可以部署额外的分类器来过滤被认为不安全的输入和输出。有关如何向推理代码的输入和输出添加安全检查器,请参阅[llama-recipes repo](https://github.com/meta-llama/llama-recipes/blob/main/recipes/inference/local_inference/inference.py)

Rayyyyy's avatar
Rayyyyy committed
166
- Meta-Llama-3-8B-Instruct 模型示例,Meta-Llama-3-70B-Instruct模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
167
168
169
170
171
172
```bash
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/original/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/original/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6
```
Rayyyyy's avatar
Rayyyyy committed
173

174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
### .safetensors格式的模型推理方法

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

# 模型文件地址
model_path = 'meta-llama/Meta-Llama-3-8B-Instruct'
prompt = '你好'

input_query = {"role": "user", "content": prompt}

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, torch_dtype="auto", device_map="auto")

input_ids = tokenizer.apply_chat_template(
    [input_query,], add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
)

response = outputs[0][input_ids.shape[-1]:]
generated_text = tokenizer.decode(response, skip_special_tokens=True)
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

Rayyyyy's avatar
Rayyyyy committed
203
### 多轮对话
Rayyyyy's avatar
Rayyyyy committed
204
1. 确认环境安装及模型下载完毕;
Rayyyyy's avatar
Rayyyyy committed
205
2. 修改[chat.sh](./chat.sh)文件中的`--ckpt_dir``--tokenizer_path`参数为本地模型地址,`--max_seq_len`根据自身需求进行修改,调整该值可以增加多轮对话模型的记忆长度,不过需要注意的是这可能会增加模型运算的时间和内存需求;
Rayyyyy's avatar
Rayyyyy committed
206
207
208
209
3. 执行:
```bash
bash chat.sh
```
Rayyyyy's avatar
Rayyyyy committed
210

Rayyyyy's avatar
Rayyyyy committed
211
### 验证
Rayyyyy's avatar
Rayyyyy committed
212
1. 安装`llama-recipes``lm-eval`
Rayyyyy's avatar
Rayyyyy committed
213
214
```bash
# llama-recipes 下载
chenzk's avatar
chenzk committed
215
git clone http://developer.sourcefind.cn/codes/chenych/llama-recipes.git
Rayyyyy's avatar
Rayyyyy committed
216
217
cd llama-recipes
# 移动exact_match.py文件
218
mv exact_match.py ~/.cache/huggingface/evaluate/downloads/
Rayyyyy's avatar
Rayyyyy committed
219
cd ../
Rayyyyy's avatar
Rayyyyy committed
220
# lm-eval 下载
chenzk's avatar
chenzk committed
221
git clone http://developer.sourcefind.cn/codes/chenych/lm-evaluation-harness.git
Rayyyyy's avatar
Rayyyyy committed
222
223
224
225
cd ./lm-evaluation-harness/
pip install -e .
```

Rayyyyy's avatar
Rayyyyy committed
226
2. 修改待测模型**pretrained**参数地址,例如`/home/Meta-Llama-3-8B-Instruct`,特别地,当前仅支持`hellaswag`数据集进行测试验证。执行以下命令:
Rayyyyy's avatar
Rayyyyy committed
227
```bash
Rayyyyy's avatar
Rayyyyy committed
228
cd /path_of/llama-recipes/recipes/evaluation
Rayyyyy's avatar
Rayyyyy committed
229
# 必须添加HF_ENDPOINT环境变量
230
export HF_ENDPOINT=https://hf-mirror.com
Rayyyyy's avatar
Rayyyyy committed
231
# 执行
Rayyyyy's avatar
Rayyyyy committed
232
233
python eval.py --model hf --model_args pretrained=/home/llama3/Meta-Llama-3-8B-Instruct,dtype="float" --tasks hellaswag --device cuda --batch_size 8
```
Rayyyyy's avatar
Rayyyyy committed
234
235
236
<div align=center>
    <img src="./doc/evaluation.png"/>
</div>
Rayyyyy's avatar
Rayyyyy committed
237

Rayyyyy's avatar
Rayyyyy committed
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
## result
- Meta-Llama-3-8B-Instruct
<div align=center>
    <img src="./doc/Meta-Llama-3-8B-Instruct.png"/>
</div>

- Meta-Llama-3-8B
<div align=center>
    <img src="./doc/Meta-Llama-3-8B.png"/>
</div>

### 精度
暂无

## 应用场景
### 算法类别
对话问答

### 热点应用行业
制造,广媒,家居,教育

## 预训练权重
chenzk's avatar
chenzk committed
260
261
262
263
264
265
266
267
268
通过HF下载预训练模型:
- [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
- [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B)
- [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)
- [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
- [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- [Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)
- [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)
Rayyyyy's avatar
Rayyyyy committed
269

Rayyyyy's avatar
Rayyyyy committed
270
271
模型目录结构如下:
```bash
Rayyyyy's avatar
Rayyyyy committed
272
├── model_save_path
Rayyyyy's avatar
Rayyyyy committed
273
│   ├── Meta-Llama-3-8B
Rayyyyy's avatar
Rayyyyy committed
274
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
275
276
277
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
278
279
280
281
282
283
284
285
286
287
288
289
290
291
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
292
│   ├── Meta-Llama-3-8B-Instruct
Rayyyyy's avatar
Rayyyyy committed
293
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
294
295
296
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
297
298
299
300
301
302
303
304
305
306
307
308
309
310
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
311
│   ├── Meta-Llama-3-70B
Rayyyyy's avatar
Rayyyyy committed
312
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
313
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
314
│           ...
Rayyyyy's avatar
Rayyyyy committed
315
316
317
│           ├── consolidated.07.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
333
│   └── Meta-Llama-3-70B-Instruct
Rayyyyy's avatar
Rayyyyy committed
334
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
335
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
336
│           ...
Rayyyyy's avatar
Rayyyyy committed
337
│           ├── consolidated.07.pth
Rayyyyy's avatar
Rayyyyy committed
338
339
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
355
356
```

Rayyyyy's avatar
Rayyyyy committed
357
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
358
- https://developer.sourcefind.cn/codes/modelzoo/llama3_pytorch
Rayyyyy's avatar
Rayyyyy committed
359
360
361

## 参考资料
- https://github.com/meta-llama/llama3
Rayyyyy's avatar
Rayyyyy committed
362
- https://github.com/InternLM/xtuner
Rayyyyy's avatar
Rayyyyy committed
363
- https://github.com/meta-llama/llama-recipes
364
- https://github.com/hiyouga/LLaMA-Factory/tree/v0.6.3