README.md 12.1 KB
Newer Older
Rayyyyy's avatar
Rayyyyy committed
1
2
3
4
5
# llama3
## 论文
[llama3](https://llama.meta.com/llama3/)

## 模型结构
Rayyyyy's avatar
Rayyyyy committed
6
7
8
9
10
Llama-3中选择了一个相对标准的decoder-only的transformer架构。与Llama-2相比,做了几个关键的改进:
- 基于超过15T token训练数据,大小相当于Llama 2数据集的7倍还多,增强了推理、代码生成和指令跟随等方面的能力;
- 支持8K长文本(之前是4k),改进的tokenizer具有128K tokens的词汇量,可以更有效地对语言进行编码,从而大大提高了模型的性能;
- 采用分组查询注意力(grouped query attention,GQA)、掩码等技术,帮助开发者以最低的能耗获取绝佳的性能。
- 在8,192个tokens的序列上训练模型,使用掩码来确保self-attention不会跨越文档边界。
Rayyyyy's avatar
Rayyyyy committed
11
12
13
14


## 环境配置
-v 路径、docker_name和imageID根据实际情况修改
Rayyyyy's avatar
Rayyyyy committed
15
**注意**:bitsandbytes库功能不全,暂不支持量化相关
Rayyyyy's avatar
Rayyyyy committed
16
17
18

### Docker(方法一)
```bash
Rayyyyy's avatar
Rayyyyy committed
19
20
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk24.04-py310
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
21
22
23
24
25
26
27

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Dockerfile(方法二)
```bash
Rayyyyy's avatar
Rayyyyy committed
28
cd docker
Rayyyyy's avatar
Rayyyyy committed
29
docker build --no-cache -t llama3:latest .
Rayyyyy's avatar
Rayyyyy committed
30
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro --shm-size=80G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Rayyyyy's avatar
Rayyyyy committed
31
32
33
34
35
36
37
38

cd /your_code_path/llama3_pytorch
pip install -e .
```

### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```bash
Rayyyyy's avatar
Rayyyyy committed
39
DTK驱动: dtk24.04
Rayyyyy's avatar
Rayyyyy committed
40
python: python3.10
Rayyyyy's avatar
Rayyyyy committed
41
torch: 2.1.0
Rayyyyy's avatar
Rayyyyy committed
42
xtuner: 0.1.18
Rayyyyy's avatar
Rayyyyy committed
43
```
Rayyyyy's avatar
Rayyyyy committed
44
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
Rayyyyy's avatar
Rayyyyy committed
45
46
47
48
49
50
51

其它非深度学习库安装方式如下:
```bash
pip install -e .
```

## 数据集
Rayyyyy's avatar
Rayyyyy committed
52
53
54
55
56
57
```
├── llama3_pytorch
│   ├── datasets
│       ├── alpaca_data.json
│       └── multi_turn_dataset_2.json
```
Rayyyyy's avatar
Rayyyyy committed
58
59

## 训练
Rayyyyy's avatar
Rayyyyy committed
60
### xtuner微调方法
Rayyyyy's avatar
Rayyyyy committed
61
62
1. 训练库安装,请注意所需库版本
```bash
Rayyyyy's avatar
Rayyyyy committed
63
64
pip uninstall flash-attn # 2.0.4+82379d7.abi0.dtk2404.torch2.1
# docker环境含有deepspeed的可不进行安装, 需要对照版本是否一致即可
Rayyyyy's avatar
Rayyyyy committed
65
pip install deepspeed-0.12.3+das1.0+gita724046.abi0.dtk2404.torch2.1.0-cp310-cp310-manylinux2014_x86_64.whl
Rayyyyy's avatar
Rayyyyy committed
66
67
68
git clone -b v0.1.18 https://github.com/InternLM/xtuner.git
cd xtuner
pip install -e '.[all]'
Rayyyyy's avatar
Rayyyyy committed
69
70
71
pip install mmengine==0.10.3
```
2. 下载预训练模型,具体模型请修改`download_models.py`
Rayyyyy's avatar
Rayyyyy committed
72
73
74
75
76
```bash
cd /your_code_path/llama3_pytorch
pip install modelscope
python download_models.py
```
Rayyyyy's avatar
Rayyyyy committed
77
2. 修改[llama3_8b_instruct_qlora_alpaca_e3_M.py](./llama3_8b_instruct_qlora_alpaca_e3_M.py)代码中的`pretrained_model_name_or_path``data_path`为本地模型、数据地址;
Rayyyyy's avatar
Rayyyyy committed
78
3. 根据硬件环境和自身训练需求来调整`max_length``batch_size``accumulative_counts``max_epochs``lr``save_steps``evaluation_freq`、model.lora中的`r``lora_alpha`参数,默认参数支持4*32G;
Rayyyyy's avatar
Rayyyyy committed
79
4. ${DCU_NUM}参数修改为要使用的DCU卡数量,不同数据集需要修改llama3_8b_instruct_qlora_alpaca_e3_M.py中`SYSTEM``evaluation_inputs``dataset_map_fn``train_dataloader.sampler``train_cfg`参数设置,详情请参考代码注释项,当前默认alpaca数据集,**`--work-dir`设定保存模型路径**
Rayyyyy's avatar
Rayyyyy committed
80
5. 执行
Rayyyyy's avatar
Rayyyyy committed
81
```bash
Rayyyyy's avatar
Rayyyyy committed
82
83
bash finetune.sh
or
Rayyyyy's avatar
Rayyyyy committed
84
85
NPROC_PER_NODE=${DCU_NUM} xtuner train ./llama3_8b_instruct_qlora_alpaca_e3_M.py --deepspeed deepspeed_zero2
```
Rayyyyy's avatar
Rayyyyy committed
86
87

## 推理
Rayyyyy's avatar
Rayyyyy committed
88
预训练模型下载方法请参考下面的[预训练权重](#预训练权重)章节,不同的模型需要不同的模型并行(MP)值,如下表所示:
Rayyyyy's avatar
Rayyyyy committed
89

Rayyyyy's avatar
Rayyyyy committed
90
91
92
|  Model | MP |
|--------|----|
| 8B     | 1  |
Rayyyyy's avatar
Rayyyyy committed
93
| 70B    | 8  |
Rayyyyy's avatar
Rayyyyy committed
94
95
96
97

所有模型都支持序列长度高达8192个tokens,但我们根据max_seq_len和max_batch_size值预先分配缓存。根据你的硬件设置。

**Tips:**
Rayyyyy's avatar
Rayyyyy committed
98
- `–nproc_per_node`需要根据模型的MP值进行设置(参考上表)。
Rayyyyy's avatar
Rayyyyy committed
99
100
101
- `max_seq_len``max_batch_size`参数按需设置。

### Pretrained模型
Rayyyyy's avatar
Rayyyyy committed
102
这些模型都没有针对聊天或者Q&A进行微调。可以参考`example_text_completion.py`里的用例。
Rayyyyy's avatar
Rayyyyy committed
103

Rayyyyy's avatar
Rayyyyy committed
104
- Meta-Llama-3-8B 模型示例,Meta-Llama-3-70B模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
105
```bash
Rayyyyy's avatar
Rayyyyy committed
106
torchrun --nproc_per_node 1 example_text_completion.py \
Rayyyyy's avatar
Rayyyyy committed
107
    --ckpt_dir Meta-Llama-3-8B/original/ \
Rayyyyy's avatar
Rayyyyy committed
108
109
110
111
    --tokenizer_path Meta-Llama-3-8B/original/tokenizer.model \
    --max_seq_len 128 --max_batch_size 4
```

Rayyyyy's avatar
Rayyyyy committed
112
### Instruction-tuned模型
Rayyyyy's avatar
Rayyyyy committed
113
经过微调的模型被训练用于对话应用程序。为了获得模型的预期特性和性能,需要遵循 [`ChatFormat`](llama/tokenizer.py#L202)中定义的特定格式:
Rayyyyy's avatar
Rayyyyy committed
114
115
- 提示以特殊令牌`<|begin_of_text|>`开始,之后跟随一个或多个消息。
- 每条消息以标签`<|start_header_id|>`开始,角色为`system``user`或者`assistant`、并以标签`<|end_header_id|>`结束。
Rayyyyy's avatar
Rayyyyy committed
116
117
- 在双换行符`\n\n`之后,消息的内容随之而来。
- 每条消息的结尾由`<|eot_id|>`令牌标记。
Rayyyyy's avatar
Rayyyyy committed
118
119
120

您还可以部署额外的分类器来过滤被认为不安全的输入和输出。有关如何向推理代码的输入和输出添加安全检查器,请参阅[llama-recipes repo](https://github.com/meta-llama/llama-recipes/blob/main/recipes/inference/local_inference/inference.py)

Rayyyyy's avatar
Rayyyyy committed
121
- Meta-Llama-3-8B-Instruct 模型示例,Meta-Llama-3-70B-Instruct模型仅需替换–-nproc_per_node、--ckpt_dir、--tokenizer_path对应模型地址即可。
Rayyyyy's avatar
Rayyyyy committed
122
123
124
125
126
127
```bash
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/original/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/original/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6
```
Rayyyyy's avatar
Rayyyyy committed
128

Rayyyyy's avatar
Rayyyyy committed
129
130
## 多轮对话
1. 确认环境安装及模型下载完毕;
Rayyyyy's avatar
Rayyyyy committed
131
2. 修改[chat.sh](./chat.sh)文件中的`--ckpt_dir``--tokenizer_path`参数为本地模型地址,`--max_seq_len`根据自身需求进行修改,调整该值可以增加多轮对话模型的记忆长度,不过需要注意的是这可能会增加模型运算的时间和内存需求;
Rayyyyy's avatar
Rayyyyy committed
132
133
134
135
3. 执行:
```bash
bash chat.sh
```
Rayyyyy's avatar
Rayyyyy committed
136

Rayyyyy's avatar
Rayyyyy committed
137
## Evaluation
Rayyyyy's avatar
Rayyyyy committed
138
1. 安装`llama-recipes``lm-eval`
Rayyyyy's avatar
Rayyyyy committed
139
140
141
142
143
144
145
146
```bash
# llama-recipes 下载
git clone https://github.com/meta-llama/llama-recipes.git
cd ./llama-recipes/recipes/evaluation/
# 修改eval.py第15行代码,将from lm_eval.utils import make_table 改为
from lm_eval.evaluator import make_table
# 修改eval.py第121行代码,num_fewshot参数的默认值改为0
default=0
Rayyyyy's avatar
Rayyyyy committed
147
148
# 修改eval.py第215行代码,use_cache=args.use_cache 修改为
no_cache=args.use_cache
Rayyyyy's avatar
Rayyyyy committed
149
150
151

# 返回根目录
cd ~
Rayyyyy's avatar
Rayyyyy committed
152

Rayyyyy's avatar
Rayyyyy committed
153
154
155
156
157
158
159
# lm-eval 下载
git clone http://developer.hpccube.com/codes/chenych/lm-evaluation-harness.git
cd ./lm-evaluation-harness/
pip install -e .
cd ../
```

Rayyyyy's avatar
Rayyyyy committed
160
2. 修改待测模型**pretrained**参数地址,例如`/home/Meta-Llama-3-8B-Instruct`,特别地,当前仅支持`hellaswag`数据集进行测试验证。执行以下命令:
Rayyyyy's avatar
Rayyyyy committed
161
```bash
Rayyyyy's avatar
Rayyyyy committed
162
cd /path_of/llama-recipes/recipes/evaluation
Rayyyyy's avatar
Rayyyyy committed
163
164
python eval.py --model hf --model_args pretrained=/home/llama3/Meta-Llama-3-8B-Instruct,dtype="float" --tasks hellaswag --device cuda --batch_size 8
```
Rayyyyy's avatar
Rayyyyy committed
165
166
167
<div align=center>
    <img src="./doc/evaluation.png"/>
</div>
Rayyyyy's avatar
Rayyyyy committed
168

Rayyyyy's avatar
Rayyyyy committed
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
## result
- Meta-Llama-3-8B-Instruct
<div align=center>
    <img src="./doc/Meta-Llama-3-8B-Instruct.png"/>
</div>

- Meta-Llama-3-8B
<div align=center>
    <img src="./doc/Meta-Llama-3-8B.png"/>
</div>

### 精度
暂无

## 应用场景
### 算法类别
对话问答

### 热点应用行业
制造,广媒,家居,教育

## 预训练权重
1. 环境安装
```bash
pip install -U huggingface_hub hf_transfer
export HF_ENDPOINT=https://hf-mirror.com
```

2. 预训练模型下载,**token**参数通过huggingface账号获取
Rayyyyy's avatar
Rayyyyy committed
198
199
200
201
202
203
204

- Meta-Llama-3-8B 模型
```bash
mkdir Meta-Llama-3-8B
huggingface-cli download meta-llama/Meta-Llama-3-8B --include "original/*" --local-dir Meta-Llama-3-8B --token hf_*
```

Rayyyyy's avatar
Rayyyyy committed
205
206
207
- Meta-Llama-3-8B-Instruct 模型
```bash
mkdir Meta-Llama-3-8B-Instruct
Rayyyyy's avatar
Rayyyyy committed
208
huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir Meta-Llama-3-8B-Instruct --token hf_*
Rayyyyy's avatar
Rayyyyy committed
209
```
Rayyyyy's avatar
Rayyyyy committed
210

Rayyyyy's avatar
Rayyyyy committed
211
212
213
214
215
216
217
218
219
220
221
222
- Meta-Llama-3-70B 模型
```bash
mkdir Meta-Llama-3-70B
huggingface-cli download meta-llama/Meta-Llama-3-70B --include "original/*" --local-dir Meta-Llama-3-70B --token hf_*
```

- Meta-Llama-3-70B-Instruct 模型
```bash
mkdir Meta-Llama-3-70B-Instruct
huggingface-cli download meta-llama/Meta-Llama-3-70B-Instruct --include "original/*" --local-dir Meta-Llama-3-70B-Instruct --token hf_*
```

Rayyyyy's avatar
Rayyyyy committed
223
224
模型目录结构如下:
```bash
Rayyyyy's avatar
Rayyyyy committed
225
├── model_save_path
Rayyyyy's avatar
Rayyyyy committed
226
│   ├── Meta-Llama-3-8B
Rayyyyy's avatar
Rayyyyy committed
227
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
228
229
230
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
231
232
233
234
235
236
237
238
239
240
241
242
243
244
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
245
│   ├── Meta-Llama-3-8B-Instruct
Rayyyyy's avatar
Rayyyyy committed
246
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
247
248
249
│           ├── consolidated.00.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
250
251
252
253
254
255
256
257
258
259
260
261
262
263
│       ├── config.json
│       ├── configuration.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00004.safetensors
│       ├── model-00002-of-00004.safetensors
│       ├── model-00003-of-00004.safetensors
│       ├── model-00004-of-00004.safetensors
│       ├── model.safetensors.index.json
│       ├── README.md
│       ├── special_tokens_map.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
264
│   ├── Meta-Llama-3-70B
Rayyyyy's avatar
Rayyyyy committed
265
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
266
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
267
│           ...
Rayyyyy's avatar
Rayyyyy committed
268
269
270
│           ├── consolidated.07.pth
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
286
│   └── Meta-Llama-3-70B-Instruct
Rayyyyy's avatar
Rayyyyy committed
287
│       ├── original
Rayyyyy's avatar
Rayyyyy committed
288
│           ├── consolidated.00.pth
Rayyyyy's avatar
Rayyyyy committed
289
│           ...
Rayyyyy's avatar
Rayyyyy committed
290
│           ├── consolidated.07.pth
Rayyyyy's avatar
Rayyyyy committed
291
292
│           ├── params.json
│           └── tokenizer.model
Rayyyyy's avatar
Rayyyyy committed
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
│       ├── config.json
│       ├── generation_config.json
│       ├── LICENSE
│       ├── README.md
│       ├── model-00001-of-00030.safetensors
│       ├── model-00002-of-00030.safetensors
│       ...
│       ├── model-000029-of-00030.safetensors
│       ├── model-000030-of-00030.safetensors
│       ├── model.safetensors.index.json
│       ├── tokenizer_config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       ├── special_tokens_map.json
│       └── USE_POLICY.md
Rayyyyy's avatar
Rayyyyy committed
308
309
```

Rayyyyy's avatar
Rayyyyy committed
310
311
312
313
314
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/llama3_pytorch

## 参考资料
- https://github.com/meta-llama/llama3
Rayyyyy's avatar
Rayyyyy committed
315
- https://github.com/InternLM/xtuner
Rayyyyy's avatar
Rayyyyy committed
316
- https://github.com/meta-llama/llama-recipes