README.md 6.43 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
# LLama_FT

zhuwenwen's avatar
zhuwenwen committed
3
4
## 论文
- [https://arxiv.org/pdf/2302.13971.pdf](https://arxiv.org/pdf/2302.13971.pdf)
zhuwenwen's avatar
zhuwenwen committed
5
6
7
8
9
10
11

## 模型结构
LLAMA网络基于 Transformer 架构。提出了各种改进,并用于不同的模型,例如 PaLM。以下是与原始架构的主要区别:
预归一化。为了提高训练稳定性,对每个transformer 子层的输入进行归一化,而不是对输出进行归一化。使用 RMSNorm 归一化函数。
SwiGLU 激活函数 [PaLM]。使用 SwiGLU 激活函数替换 ReLU 非线性以提高性能。使用 2 /3 4d 的维度而不是 PaLM 中的 4d。
旋转嵌入。移除了绝对位置嵌入,而是添加了旋转位置嵌入 (RoPE),在网络的每一层。

zhuwenwen's avatar
zhuwenwen committed
12
13
## 模型介绍
LLama是一个基础语言模型的集合,参数范围从7B到65B。在数万亿的tokens上训练出的模型,并表明可以专门使用公开可用的数据集来训练最先进的模型,而不依赖于专有的和不可访问的数据集。
zhuwenwen's avatar
zhuwenwen committed
14

zhuwenwen's avatar
zhuwenwen committed
15
## 环境配置
zhuwenwen's avatar
zhuwenwen committed
16
17

提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
zhuwenwen's avatar
zhuwenwen committed
18
19
20
21
22
23
24
25
26
```
docker pull docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastertransformer-dtk23.04-latest
docker run -it --name llama --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastertransformer-dtk23.04-latest /bin/bash
```

镜像版本依赖:
* DTK驱动:dtk23.04
* Pytorch: 1.10
* python: python3.8
zhuwenwen's avatar
zhuwenwen committed
27
28
29
30

激活镜像环境:
`source /opt/dtk-23.04/env.sh`

zhuwenwen's avatar
zhuwenwen committed
31
32
## 数据集
训练数据集:CCNet [67%], C4 [15%], GitHub [4.5%], Wikipedia [4.5%], Books [4.5%], ArXiv [2.5%], Stack Exchange[2%],Wikipedia和Books包括以下语言的数据:bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk。评估数据集:BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, MMLU, BIG-bench hard, GSM8k, RealToxicityPrompts, WinoGender, CrowS-Pairs。
zhuwenwen's avatar
zhuwenwen committed
33

zhuwenwen's avatar
zhuwenwen committed
34
## 推理
zhuwenwen's avatar
zhuwenwen committed
35
36
37
38
39
40
41
42
43
44
45
46
### 编译

```
mkdir build
cd build
cmake -DSM=70 -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DCMAKE_CXX_COMPILER=nvcc ..
make -j12
```

### 模型下载

[llama 7B](https://huggingface.co/decapoda-research/llama-7b-hf)
zhuwenwen's avatar
zhuwenwen committed
47
48
49

[llama 13B](https://huggingface.co/decapoda-research/llama-13b-hf)

zhuwenwen's avatar
zhuwenwen committed
50
[llama 30B](https://huggingface.co/decapoda-research/llama-30b-hf)
zhuwenwen's avatar
zhuwenwen committed
51
52
53

[llama 65B](https://huggingface.co/decapoda-research/llama-65b-hf)

zhuwenwen's avatar
zhuwenwen committed
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129

模型转换

```bash
python ../examples/cpp/llama/huggingface_llama_convert.py \
-saved_dir=/data/models/llama-7b-infer/ \
-in_file=/data/models/llama-7b-hf/ \
-infer_gpu_num=1 -weight_data_type=fp16 -model_name=llama_7b
```

例如llama-7b的转换:`-in_file`为模型输入路径,`-saved_dir`为模型输出路径,`-infer_gpu_num`为推理的tp大小,`-weight_data_type`为推理的数据类型,`-model_name`为模型名称.若使用其他模型,对应修改路径和`-model_name`.

### 运行 LLama-7b

1. 生成`gemm_config.in`文件

data_type = 0 (FP32) or 1 (FP16)

```bash
./bin/gpt_gemm 1 1 20 52 128 17920 32000 1 1
```

上述参数对应为

```bash 
./bin/gpt_gemm <batch_size> <beam_width> <max_input_len> <head_number> <size_per_head> <inter_size> <vocab_size> <data_type> <tensor_para_size> 
```

2. 配置`../examples/cpp/llama/llama_config.ini`

data_type = 1时,data_type = fp16;data_type = 0时,data_type = fp32,tensor_para_size和模型转换设置的tp数保持一致,model_name=llama_7B,model_dir为对应的模型权重,request_batch_size为推理的batch_size数量,request_output_len为输出长度,`../examples/cpp/llama//start_ids.csv`可以修改输入的起始id.

3. 运行

```bash
./bin/llama_example
```
该程序会读取`../examples/cpp/llama//start_ids.csv`中的id作为输入tokens,生成的结果会保存在`.out`.


### 运行 LLama-13b

```bash
./bin/gpt_gemm 1 1 20 40 128 13824 32000 1 1
./bin/llama_example
```

### 运行 LLama-33b

```bash
./bin/gpt_gemm 1 1 20 52 128 17920 32000 1 2
mpirun --allow-run-as-root -np 2 ./bin/llama_example
```

### 运行 LLama-65b

```bash
./bin/gpt_gemm 1 1 20 64 128 22016 32000 1 8
mpirun --allow-run-as-root -np 8 ./bin/llama_example 
```

### 参数配置说明

llama-33b模型,使用fp16推理需要2张卡(32G),llama-65b模型,使用fp16推理需要8张卡(32G).
从huggingface下载llama模型,可以查看config.json文件,如下左边为fastertransformer参数,后边对应config.son文件中的参数值.

```bash
head_num=num_attention_heads
size_per_head=hidden_size / num_attention_heads
inter_size=intermediate_size
num_layer=num_hidden_layers
rotary_embedding=size_per_head
layernorm_eps=rms_norm_eps
vocab_size=vocab_size
```

zhuwenwen's avatar
zhuwenwen committed
130
## result
zhuwenwen's avatar
zhuwenwen committed
131
```
zhuwenwen's avatar
zhuwenwen committed
132
133
build/
    out
zhuwenwen's avatar
zhuwenwen committed
134
```
zhuwenwen's avatar
zhuwenwen committed
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

## 精度
测试数据:"I believe the meaning of life is" (token id: 306, 4658, 278, 6593, 310, 2834, 338),使用的加速卡:1张 DCU-Z100L-32G
准确性数据:
| 数据类型 | batch size | temperate | input len | output len |
| :------: | :------: | :------: | :------: |:------: |
| fp16 | 1 | 0 | 7 | 256 |

输出:
```
I believe the meaning of life is to live it to the fullest. I believe that we are all here for a reason and that we are all here to help each other. I believe that we are all here to learn and grow and that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here to help each other learn and grow. I believe that we are all here
```

## 应用场景

### 算法类别
NLP

### 热点应用行业
金融,科研,教育

zhuwenwen's avatar
zhuwenwen committed
156
157
158
159
## 源码仓库及问题反馈
* https://developer.hpccube.com/codes/modelzoo/llama_ft

## 参考
zhuwenwen's avatar
zhuwenwen committed
160
* [https://github.com/NVIDIA/FasterTransformer](https://github.com/NVIDIA/FasterTransformer)