README.md 4.18 KB
Newer Older
1
2
3
4
5
# Baichuan2

## 论文
- https://arxiv.org/abs/2309.10305

dcuai's avatar
dcuai committed
6
## 模型结构
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
模型具体参数:

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 |  位置编码 | 最大长 |
| -------- | -------- | -------- | -------- |   -------- | -------- | -------- |
| Baichuan 2-7B | 4,096 | 32 | 32 | 125,696 |  RoPE | 4096 |
| Baichuan 2-13B | 5,120 | 40 | 	40 | 125,696 |   ALiBi | 4096 |
<div align="center">
<img src="./docs/transformer.jpg" width="400" height="300">
</div>

## 算法原理
Baichuan整体模型基于标准的Transformer结构,采用了和LLaMA一样的模型设计。其中,Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术,相对于Rotary Embedding计算量更小,对推理性能有显著提升。
<div align="center">
<img src="./docs/transformer.png" width="450" height="300">
</div>

## 环境配置

### Docker(方法一)
dcuai's avatar
dcuai committed
26
```
dcuai's avatar
dcuai committed
27
docker pull  image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.10
28

dcuai's avatar
dcuai committed
29
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
xuxzh1's avatar
update  
xuxzh1 committed
30

dcuai's avatar
dcuai committed
31
```
xuxzh1's avatar
update  
xuxzh1 committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
1. 安装Rust

```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

2. 安装Protoc

```shell
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```
3. 安装TGI Service

```bash
xuxzh1's avatar
update  
xuxzh1 committed
50
cd baichuan2_tgi
xuxzh1's avatar
update  
xuxzh1 committed
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
cd text-generation-inference
#安装exllama
cd server
make install-exllama #安装exllama kernels
make install-exllamav2 #安装exllmav2 kernels
cd .. #回到项目根目录
source $HOME/.cargo/env
BUILD_EXTENSIONS=True make install #安装text-generation服务
```
4. 安装benchmark

```bash
cd text-generation-inference
make install-benchmark
```

注意:若安装过程过慢,可以通过如下命令修改默认源提速。

```bash
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```

另外,`cargo install` 太慢也可以通过在`~/.cargo/config`中添加源来提速。
dcuai's avatar
dcuai committed
74

dcuai's avatar
dcuai committed
75
5. 查看安装的版本号
xuxzh1's avatar
update  
xuxzh1 committed
76
77
78
79
80

```bash
text-generation-launcher -V  #版本号与官方版本同步
```

81
82
83
84
85
86
87
88

## 数据集


## 推理

### 模型下载

dcuai's avatar
dcuai committed
89
[Baichuan2-7B-Base](http://113.200.138.88:18080/aimodels/Baichuan2-7B-Base)
90

dcuai's avatar
dcuai committed
91
[Baichuan2-7B-Chat](http://113.200.138.88:18080/aimodels/Baichuan2-7B-Chat)
92
93
94


### 部署TGI
dcuai's avatar
dcuai committed
95

dcuai's avatar
dcuai committed
96
#### 使用前
dcuai's avatar
dcuai committed
97
98
99
100
101

```bash
export PYTORCH_TUNABLEOP_ENABLED=0
```

102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
#### 1. 启动TGI服务
```
HIP_VISIBLE_DEVICES=2 text-generation-launcher --dtype=float16 --model-id /models/baichuan2/Baichuan2-7B-Chat --trust-remote-code --port 3001
```
更多参数可使用如下方式查看
```
text-generation-launcher --help
```
#### 2. 请求服务

curl命令方式:
```
curl 127.0.0.1:3001/generate \
    -X POST \
    -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \
    -H 'Content-Type: application/json'
```
通过python调用的方式:
```
import requests

headers = {
    "Content-Type": "application/json",
}

data = {
    'inputs': 'What is Deep Learning?',
    'parameters': {
        'max_new_tokens': 20,
    },
}

response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data)
print(response.json())
# {'generated_text': ' Deep Learning is a subset of machine learning where neural networks are trained deep within a hierarchy of layers instead'}
```
更多API查看,请参考 [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference)


dcuai's avatar
dcuai committed
141
## result
xuxzh1's avatar
update  
xuxzh1 committed
142
![img1](./readme_images/img1.png)
143

dcuai's avatar
dcuai committed
144

dcuai's avatar
dcuai committed
145
## 精度
dcuai's avatar
dcuai committed
146
147


148
149
150
151
152
153
154
155
156
157
158
159
160
## 应用场景

### 算法类别
对话问答

### 热点应用行业
金融,科研,教育

## 源码仓库及问题反馈
* [https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi](https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi)

## 参考资料
* [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)