README.md 8.56 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
3
4
5
6
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2023-09-06 18:04:07
 * @LastEditTime: 2023-09-08 09:00:01
-->
dcuai's avatar
dcuai committed
7
# LLama
zhuwenwen's avatar
zhuwenwen committed
8

zhuwenwen's avatar
zhuwenwen committed
9
10
## 论文
- [https://arxiv.org/pdf/2302.13971.pdf](https://arxiv.org/pdf/2302.13971.pdf)
zhuwenwen's avatar
zhuwenwen committed
11
12
13
14
15
16
17

## 模型结构
LLAMA网络基于 Transformer 架构。提出了各种改进,并用于不同的模型,例如 PaLM。以下是与原始架构的主要区别:
预归一化。为了提高训练稳定性,对每个transformer 子层的输入进行归一化,而不是对输出进行归一化。使用 RMSNorm 归一化函数。
SwiGLU 激活函数 [PaLM]。使用 SwiGLU 激活函数替换 ReLU 非线性以提高性能。使用 2 /3 4d 的维度而不是 PaLM 中的 4d。
旋转嵌入。移除了绝对位置嵌入,而是添加了旋转位置嵌入 (RoPE),在网络的每一层。

zhuwenwen's avatar
zhuwenwen committed
18
19
![img](./docs/images/llama.png)

zhuwenwen's avatar
zhuwenwen committed
20
## 算法原理
zhuwenwen's avatar
zhuwenwen committed
21
LLama是一个基础语言模型的集合,参数范围从7B到65B。在数万亿的tokens上训练出的模型,并表明可以专门使用公开可用的数据集来训练最先进的模型,而不依赖于专有的和不可访问的数据集。
zhuwenwen's avatar
zhuwenwen committed
22

zhuwenwen's avatar
zhuwenwen committed
23
24
![img](./docs/images/llama_1.png)

zhuwenwen's avatar
zhuwenwen committed
25
## 环境配置
zhuwenwen's avatar
zhuwenwen committed
26
27

提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
zhuwenwen's avatar
zhuwenwen committed
28
29
```
docker pull docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastertransformer-dtk23.04-latest
zhuwenwen's avatar
zhuwenwen committed
30
31
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
zhuwenwen's avatar
zhuwenwen committed
32
# <Container Path>容器映射路径
zhuwenwen's avatar
zhuwenwen committed
33
docker run -it --name llama --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
zhuwenwen's avatar
zhuwenwen committed
34
35
36
37
38
39
```

镜像版本依赖:
* DTK驱动:dtk23.04
* Pytorch: 1.10
* python: python3.8
zhuwenwen's avatar
zhuwenwen committed
40
41
42
43

激活镜像环境:
`source /opt/dtk-23.04/env.sh`

zhuwenwen's avatar
zhuwenwen committed
44
## 数据集
zhuwenwen's avatar
zhuwenwen committed
45

zhuwenwen's avatar
zhuwenwen committed
46

zhuwenwen's avatar
zhuwenwen committed
47
## 推理
zhuwenwen's avatar
zhuwenwen committed
48
49
50
51
52
53
54
55
56
57
58
### 编译

```
mkdir build
cd build
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DCMAKE_CXX_COMPILER=nvcc ..
make -j12
```

### 模型下载

wanglch's avatar
wanglch committed
59
60
模型预训练权重快速下载中心[AIModels](http://113.200.138.88:18080/aimodels)

wanglch's avatar
wanglch committed
61
[LLama-7B](http://113.200.138.88:18080/aimodels/llama-7b-hf)
zhuwenwen's avatar
zhuwenwen committed
62

wanglch's avatar
wanglch committed
63
[LLama-13B](http://113.200.138.88:18080/aimodels/llama-13b-hf)
zhuwenwen's avatar
zhuwenwen committed
64

wanglch's avatar
wanglch committed
65
[LLama-33B](http://113.200.138.88:18080/aimodels/llama-30b-hf)
zhuwenwen's avatar
zhuwenwen committed
66

wanglch's avatar
wanglch committed
67
[LLama-65B](http://113.200.138.88:18080/aimodels/llama-65b-hf)
zhuwenwen's avatar
zhuwenwen committed
68

wanglch's avatar
wanglch committed
69
[LLama2-7B](http://113.200.138.88:18080/aimodels/Llama-2-7b-hf)
zhuwenwen's avatar
zhuwenwen committed
70

wanglch's avatar
wanglch committed
71
[LLama2-13B](http://113.200.138.88:18080/aimodels/Llama-2-13b-hf)
zhuwenwen's avatar
zhuwenwen committed
72

zhuwenwen's avatar
zhuwenwen committed
73
支持模型包括:LLama-7B、LLama-13B、LLama-30B、LLama-65B、LLama2-7B、LLama2-13B
zhuwenwen's avatar
zhuwenwen committed
74
75
76
77
78
79
80
81
82
83
84
85

模型转换

```bash
python ../examples/cpp/llama/huggingface_llama_convert.py \
-saved_dir=/data/models/llama-7b-infer/ \
-in_file=/data/models/llama-7b-hf/ \
-infer_gpu_num=1 -weight_data_type=fp16 -model_name=llama_7b
```

例如llama-7b的转换:`-in_file`为模型输入路径,`-saved_dir`为模型输出路径,`-infer_gpu_num`为推理的tp大小,`-weight_data_type`为推理的数据类型,`-model_name`为模型名称.若使用其他模型,对应修改路径和`-model_name`.

zhuwenwen's avatar
zhuwenwen committed
86
### 运行 LLama-7B
zhuwenwen's avatar
zhuwenwen committed
87
88
89
90
91
92

1. 生成`gemm_config.in`文件

data_type = 0 (FP32) or 1 (FP16)

```bash
zhuwenwen's avatar
zhuwenwen committed
93
./bin/gpt_gemm 1 1 7 32 128 11008 32000 1 1
zhuwenwen's avatar
zhuwenwen committed
94
95
96
97
98
99
100
101
102
103
```

上述参数对应为

```bash 
./bin/gpt_gemm <batch_size> <beam_width> <max_input_len> <head_number> <size_per_head> <inter_size> <vocab_size> <data_type> <tensor_para_size> 
```

2. 配置`../examples/cpp/llama/llama_config.ini`

zhuwenwen's avatar
zhuwenwen committed
104
data_type = 1时,data_type = fp16;data_type = 0时,data_type = fp32,tensor_para_size和模型转换设置的tp数保持一致,model_name=llama_7B,model_dir为对应的模型权重,request_batch_size为推理的batch_size数量,request_output_len为输出长度,`../examples/cpp/llama/start_ids.csv`可以修改输入的起始id.
zhuwenwen's avatar
zhuwenwen committed
105
106
107
108
109
110
111
112
113

3. 运行

```bash
./bin/llama_example
```
该程序会读取`../examples/cpp/llama//start_ids.csv`中的id作为输入tokens,生成的结果会保存在`.out`.


zhuwenwen's avatar
zhuwenwen committed
114
### 运行 LLama-13B
zhuwenwen's avatar
zhuwenwen committed
115
116

```bash
zhuwenwen's avatar
zhuwenwen committed
117
./bin/gpt_gemm 1 1 7 40 128 13824 32000 1 1
zhuwenwen's avatar
zhuwenwen committed
118
119
120
./bin/llama_example
```

zhuwenwen's avatar
zhuwenwen committed
121
### 运行 LLama-33B
zhuwenwen's avatar
zhuwenwen committed
122
123

```bash
zhuwenwen's avatar
zhuwenwen committed
124
./bin/gpt_gemm 1 1 7 52 128 17920 32000 1 2
zhuwenwen's avatar
zhuwenwen committed
125
126
127
mpirun --allow-run-as-root -np 2 ./bin/llama_example
```

zhuwenwen's avatar
zhuwenwen committed
128
### 运行 LLama-65B
zhuwenwen's avatar
zhuwenwen committed
129
130

```bash
zhuwenwen's avatar
zhuwenwen committed
131
./bin/gpt_gemm 1 1 7 64 128 22016 32000 1 8
zhuwenwen's avatar
zhuwenwen committed
132
133
134
mpirun --allow-run-as-root -np 8 ./bin/llama_example 
```

zhuwenwen's avatar
zhuwenwen committed
135

zhuwenwen's avatar
zhuwenwen committed
136
137
### 参数配置说明

zhuwenwen's avatar
zhuwenwen committed
138
139
140
注意:LLama2-7B和LLama2-13B运行同LLama-7B和LLama-13B.
LLama-33B模型,使用fp16推理需要2张卡(32G),LLama-65B模型,使用fp16推理需要8张卡(32G).
从huggingface下载LLama模型,可以查看config.json文件,如下左边为fastertransformer参数,后边对应config.son文件中的参数值.
zhuwenwen's avatar
zhuwenwen committed
141
142
143
144
145
146
147
148
149
150
151

```bash
head_num=num_attention_heads
size_per_head=hidden_size / num_attention_heads
inter_size=intermediate_size
num_layer=num_hidden_layers
rotary_embedding=size_per_head
layernorm_eps=rms_norm_eps
vocab_size=vocab_size
```

zhuwenwen's avatar
zhuwenwen committed
152
## result
zhuwenwen's avatar
zhuwenwen committed
153
```
zhuwenwen's avatar
zhuwenwen committed
154
155
build/
    out
zhuwenwen's avatar
zhuwenwen committed
156
```
zhuwenwen's avatar
zhuwenwen committed
157
158
159
160
161
162
执行一下命令可以解析out结果:
```bash
pip install sentencepiece
python ../examples/cpp/llama/llama_tokenizer.py
其中,`tokenizer`为原模型路径
```
zhuwenwen's avatar
zhuwenwen committed
163

zhuwenwen's avatar
zhuwenwen committed
164
165
166
167
168
169
测试数据:"I believe the meaning of life is" (token id: 306, 4658, 278, 6593, 310, 2834, 338),使用的加速卡:1张 DCU-Z100L-32G
| 数据类型 | batch size | temperate | input len | output len |
| :------: | :------: | :------: | :------: |:------: |
| fp16 | 1 | 0 | 7 | 256 |

结果如下:
zhuwenwen's avatar
zhuwenwen committed
170
```
zhuwenwen's avatar
zhuwenwen committed
171
306 4658 278 6593 310 2834 338 304 5735 372 304 278 2989 342 29889 306 4658 393 591 526 599 1244 363 263 2769 322 393 591 526 599 1244 304 1371 1269 916 29889 306 4658 393 591 526 599 1244 304 5110 322 6548 322 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244 304 1371 1269 916 5110 322 6548 29889 306 4658 393 591 526 599 1244
zhuwenwen's avatar
zhuwenwen committed
172
```
zhuwenwen's avatar
zhuwenwen committed
173
174
175
176
177
178
179

输出内容如下:
```
I believe the meaning of life is to live it to the fullest. I believe that we are all here for a reason and that we are all here to help each other. I believe that we are all here to learn and grow and that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here
```


zhuwenwen's avatar
zhuwenwen committed
180
## 精度
zhuwenwen's avatar
zhuwenwen committed
181

zhuwenwen's avatar
zhuwenwen committed
182
183
184
185
186


## 应用场景

### 算法类别
zhuwenwen's avatar
zhuwenwen committed
187
对话问答
zhuwenwen's avatar
zhuwenwen committed
188
189
190
191

### 热点应用行业
金融,科研,教育

wanglch's avatar
wanglch committed
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210

## 预训练权重

预训练权重快速下载中心:[SCNet AIModels](http://113.200.138.88:18080/aimodels)

项目中的预训练权重可从快速下载通道下载:

[llama-7b-hf](http://113.200.138.88:18080/aimodels/llama-7b-hf)
                                    
[llama-13b-hf](http://113.200.138.88:18080/aimodels/llama-13b-hf)

[LLama-33B](http://113.200.138.88:18080/aimodels/llama-30b-hf)

[LLama-65B](http://113.200.138.88:18080/aimodels/llama-65b-hf)

[LLama2-7B](http://113.200.138.88:18080/aimodels/Llama-2-7b-hf)

[LLama2-13B](http://113.200.138.88:18080/aimodels/Llama-2-13b-hf)

zhuwenwen's avatar
zhuwenwen committed
211
## 源码仓库及问题反馈
zhuwenwen's avatar
zhuwenwen committed
212
* https://developer.hpccube.com/codes/modelzoo/llama_fastertransformer
zhuwenwen's avatar
zhuwenwen committed
213

zhuwenwen's avatar
zhuwenwen committed
214
## 参考资料
zhuwenwen's avatar
zhuwenwen committed
215
* [https://github.com/NVIDIA/FasterTransformer](https://github.com/NVIDIA/FasterTransformer)