README.md 6.31 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
3
4
<!--
 * @Author: zhuww
 * @email: zhuww@sugon.com
 * @Date: 2023-09-08 11:08:07
zhuwenwen's avatar
zhuwenwen committed
5
 * @LastEditTime: 2023-11-08 17:47:01
zhuwenwen's avatar
zhuwenwen committed
6
-->
zhuwenwen's avatar
zhuwenwen committed
7
# BLOOM
zhuwenwen's avatar
zhuwenwen committed
8
9
10
11
12
13
14
15
16
17
18

## 论文
`BLOOM: A 176B-Parameter Open-Access Multilingual Language Model`

- [https://arxiv.org/abs/2211.05100](https://arxiv.org/abs/2211.05100)

## 模型结构
BLOOM是一个开源的支持最多59种语言和176B参数的大语言模型。它是在Megatron-LM GPT2的基础上修改训练出来的,主要使用了解码器唯一结构,对词嵌入层的归一化,使用GeLU激活函数的线性偏差注意力位置编码等技术。它的训练集包含了45种自然语言和12种编程语言,1.5TB的预处理文本转化为了350B的唯一token。bigscience在hugging face上发布的bloom模型包含多个参数多个版本。

![img](./docs/bloom.png)

zhuwenwen's avatar
zhuwenwen committed
19
## 算法原理
zhuwenwen's avatar
zhuwenwen committed
20
21
BLOOM是一种自回归大型语言模型(LLM),经过训练,可使用工业规模的计算资源根据大量文本数据的提示继续文本。因此,它能够以46种语言和13种编程语言输出连贯的文本,与人类编写的文本几乎没有区别。还可以通过将BLOOM转换为文本生成任务来指示BLOOM执行尚未明确训练的文本任务。

zhuwenwen's avatar
zhuwenwen committed
22
23
![img](./docs/bloom_1.png)

zhuwenwen's avatar
zhuwenwen committed
24
25
## 环境配置

zhuwenwen's avatar
zhuwenwen committed
26
提供[光源](https://www.sourcefind.cn/#/image/dcu/custom)拉取推理的docker镜像:
zhuwenwen's avatar
zhuwenwen committed
27
```
zhuwenwen's avatar
zhuwenwen committed
28
docker pull docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastertransformer-dtk23.10
zhuwenwen's avatar
zhuwenwen committed
29
30
31
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
luopl's avatar
luopl committed
32
docker run -it --name BLOOM_fastertransformer --privileged --shm-size=32G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> <Image ID> /bin/bash
zhuwenwen's avatar
zhuwenwen committed
33
34
35
```

镜像版本依赖:
zhuwenwen's avatar
zhuwenwen committed
36
37
* DTK驱动:dtk23.10
* Pytorch: 1.13
zhuwenwen's avatar
zhuwenwen committed
38
39
40
* python: python3.8

## 数据集
zhuwenwen's avatar
zhuwenwen committed
41

zhuwenwen's avatar
zhuwenwen committed
42
43
44

## 推理

zhuwenwen's avatar
zhuwenwen committed
45
46
### 编译
```bash
zhuwenwen's avatar
zhuwenwen committed
47
# 进入工程目录
zhuwenwen's avatar
zhuwenwen committed
48
cd bloom_fastertransformer
zhuwenwen's avatar
zhuwenwen committed
49

zhuwenwen's avatar
zhuwenwen committed
50
51
mkdir -p afstertransformer/build
cd fastertransformer/build
zhuwenwen's avatar
zhuwenwen committed
52
53
54
55
56
57
58
59
60
61
git submodule init && git submodule update
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON -DCMAKE_CXX_COMPILER=nvcc ..
export C_INCLUDE_PATH=$PWD/_deps/googletest-src/googletest/include${C_INCLUDE_PATH:+:${C_INCLUDE_PATH}}
export CPLUS_INCLUDE_PATH=$PWD/_deps/googletest-src/googletest/include${CPLUS_INCLUDE_PATH:+:${CPLUS_INCLUDE_PATH}}
make -j12

运行前:
export LD_LIBRARY_PATH=$PWD/src/fastertransformer/utils/gemm_test/CMakeFiles/gpt_gemm_func.dir:$LD_LIBRARY_PATH
```

zhuwenwen's avatar
zhuwenwen committed
62
63
64
65
### 模型下载

[bloom 7B](https://huggingface.co/bigscience/bloomz-7b1-mt)

luopl's avatar
luopl committed
66

zhuwenwen's avatar
zhuwenwen committed
67
68
69
70
模型转换

```bash
python ../examples/pytorch/gpt/utils/huggingface_bloom_convert.py \
zhuwenwen's avatar
zhuwenwen committed
71
72
--input-dir=/data/models/bloom-7b-infer/ \
--output-dir=/data/models/bloom-7b-hf/ \
zhuwenwen's avatar
zhuwenwen committed
73
74
75
76
77
-tp 1 --data-type fp16 -p 8 -v
```

其中:`--input-dir`为模型输入路径,`--output-dir`为模型输出路径,`-tp `为推理的tp大小,`--data-type`为推理的数据类型,`-p`为转换时的并行线程数,`-v`为启用详细日志记录.

zhuwenwen's avatar
zhuwenwen committed
78
### 运行 BLOOM-7b
zhuwenwen's avatar
zhuwenwen committed
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

1. 生成`gemm_config.in`文件

data_type = 0 (FP32) or 1 (FP16)

```bash
./bin/gpt_gemm 1 1 20 32 128 16384 250880 1 1
```

上述参数对应为

```bash 
./bin/gpt_gemm <batch_size> <beam_width> <max_input_len> <head_number> <size_per_head> <inter_size> <vocab_size> <data_type> <tensor_para_size> 
```

2. 配置`../examples/cpp/multi_gpu_gpt/gpt_config.ini`

其中:data_type = 1时,data_type = fp16,tensor_para_size和模型转换设置的tp数保持一致,model_name=bloom_7b,model_dir为对应的模型权重,request_batch_size为推理的batch_size数量,max_seq_len=2048,request_output_len为输出长度,`../examples/cpp/multi_gpu_gpt/start_ids.csv`可以修改输入的起始id.

3. 运行

```bash
./bin/multi_gpu_gpt_example
```
该程序会读取`../examples/cpp/multi_gpu_gpt/start_ids.csv`中的id作为输入tokens,生成的结果会保存在`.out`.

### 参数配置说明

从huggingface下载bloom模型,可以查看config.json文件,如下左边为fastertransformer参数,后边对应config.son文件中的参数值.

```bash
head_num=num_attention_heads
size_per_head=n_embed // num_attention_heads
vocab_size=vocab_size
decoder_layers=n_layer
inter_size=4*n_embed
```

## result
```
build/
    out
```
执行一下命令可以解析out结果:
```bash
zhuwenwen's avatar
zhuwenwen committed
124
python ../examples/cpp/multi_gpu_gpt/bloom_tokenizer.py
zhuwenwen's avatar
zhuwenwen committed
125
126
其中,`tokenizer`为原模型路径
```
zhuwenwen's avatar
zhuwenwen committed
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
测试数据:"Translate to English: Je t’aime." (token id: 153772, 427, 9522, 6395, 76721, 68258, 17),使用的加速卡:1张 DCU-Z100L-32G
准确性数据:
| 数据类型 | batch size | temperate | input len | output len |
| :------: | :------: | :------: | :------: |:------: |
| fp16 | 1 | 0 | 7 | 128 |

结果如下:
```
153772 427 9522 6395 76721 68258 17 473 19134 1152 17 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
```

输出内容如下:
```
Translate to English: Je t’aime. I love you.</s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s></s>
```

zhuwenwen's avatar
zhuwenwen committed
143
144

## 精度
zhuwenwen's avatar
zhuwenwen committed
145

zhuwenwen's avatar
zhuwenwen committed
146
147
148
149

## 应用场景

### 算法类别
zhuwenwen's avatar
zhuwenwen committed
150
对话问答
zhuwenwen's avatar
zhuwenwen committed
151
152
153
154
155

### 热点应用行业
金融,科研,教育

## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
156
* [https://developer.sourcefind.cn/codes/modelzoo/bloom_fastertransformer](https://developer.sourcefind.cn/codes/modelzoo/bloom_fastertransformer)
zhuwenwen's avatar
zhuwenwen committed
157

zhuwenwen's avatar
zhuwenwen committed
158
## 参考资料
zhuwenwen's avatar
zhuwenwen committed
159
* [https://github.com/NVIDIA/FasterTransformer](https://github.com/NVIDIA/FasterTransformer)