README.md 5.11 KB
Newer Older
xiabo's avatar
xiabo committed
1
# Baichuan_lmdeploy
xiabo's avatar
xiabo committed
2
## 论文
xiabo's avatar
xiabo committed
3
4


xiabo's avatar
xiabo committed
5

xiabo's avatar
xiabo committed
6
## 模型结构
xiabo's avatar
xiabo committed
7
8
9
10
11
12
13
14
Baichuan系列模型是由百川智能开发的开源大规模预训练模型,包含7B和13B等规模。其中,Baichuan-7B在大约1.2万亿tokens上训练的70亿参数模型,支持中英双语,上下文窗口长度为4096。Baichuan-13B是由百川智能继Baichuan-7B之后开发的包含130亿参数模型,它在高质量的语料上训练了1.4万亿tokens,超过LLaMA-13B 40%,是当前开源 13B 尺寸下训练数据量最多的模型。此外,百川智能还发布了对齐模型(Baichuan-13B-Chat),具有很强的对话能力。

模型具体参数:

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量 | 训练数据(tokens) | 位置编码 | 最大长 |
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Baichuan-7B | 4,096 | 32 | 32 | 64,000 | 7,000,559,616 | 1.2万亿 | RoPE | 4096 |
| Baichuan-13B | 5,120 | 40 | 	40 | 64,000 | 13,264,901,120 | 1.4万亿 | ALiBi | 4096 |
xiabo's avatar
xiabo committed
15

xiabo's avatar
xiabo committed
16
![img](./docs/baichuan.jpg)
xiabo's avatar
xiabo committed
17
18

## 算法原理
xiabo's avatar
xiabo committed
19
20
Baichuan整体模型基于标准的Transformer结构,采用了和LLaMA一样的模型设计。其中,Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术,相对于Rotary Embedding计算量更小,对推理性能有显著提升。
![img](./docs/baichuan.png)
xiabo's avatar
xiabo committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54


## 环境配置

提供[光源](https://www.sourcefind.cn/#/service-details)拉取推理的docker镜像:
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:lmdeploy0.0.13_dtk23.04_torch1.13_py38
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name qwen --shm-size=1024G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v <Host Path>:<Container Path> <Image ID> /bin/bash
```
镜像版本依赖:
* DTK驱动:dtk23.04
* Pytorch: 1.13
* python: python3.8

## 数据集


### 源码编译安装
```
# 若使用光源的镜像,可以跳过源码编译安装,镜像里面安装好了lmdeploy。
git clone http://developer.hpccube.com/codes/modelzoo/llama_lmdeploy.git
cd llama_lmdeploy
git submodule init && git submodule update
mkdir build && cd build
sh ../generate.sh
make -j 32
make install
cd .. && python3 setup.py install
```
### 模型下载

xiabo's avatar
xiabo committed
55
[baichuan-7b](https://huggingface.co/baichuan-inc/Baichuan-7B)
xiabo's avatar
xiabo committed
56

xiabo's avatar
xiabo committed
57
[baichuan2-7b-chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
xiabo's avatar
xiabo committed
58

xiabo's avatar
xiabo committed
59
60

### 运行 baichuan-7b
xiabo's avatar
xiabo committed
61
62
63
64
65
66
67
68
69
```
# 模型转换
# <model_name> 模型的名字 ('llama', 'internlm', 'vicuna', 'internlm-chat-7b', 'internlm-chat', 'internlm-chat-7b-8k', 'internlm-chat-20b', 'internlm-20b', 'baichuan-7b', 'baichuan2-7b', 'llama2', 'qwen-7b', 'qwen-14b')
# <model_path> 模型路径
# <model_format> 模型的格式 ('llama', 'hf', 'qwen')
# <tokenizer_path> tokenizer模型的路径(默认None,会去model_path里面找qwen.tiktoken)
# <model_format> 保存输出的目标路径(默认./workspace)
# <tp> 用于张量并行的GPU数量应该是2^n

xiabo's avatar
xiabo committed
70
lmdeploy convert --model_name baichuan-7b --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_baichuan7b --tp 1
xiabo's avatar
xiabo committed
71
72

# bash界面运行
xiabo's avatar
xiabo committed
73
lmdeploy chat turbomind --model_path ./workspace_baichuan7b --tp 1     # 输入问题后执行2次回车进行推理
xiabo's avatar
xiabo committed
74
75
76
77
78
79
80
81
82
83
84

# 服务器网页端运行

在bash端运行:
# <model_path_or_server> 部署模型的路径或tritonserver URL或restful api URL。前者用于与gradio直接运行服务。后者用于默认情况下使用tritonserver运行。如果输入URL是restful api。请启用另一个标志“restful_api”。
# <server_name> gradio服务器的ip地址
# <server_port> gradio服务器的ip的端口
# <batch_size> 于直接运行Turbomind的batch大小 (默认32)
# <tp> 用于张量并行的GPU数量应该是2^n (和模型转换的时候保持一致)
# <restful_api> modelpath_or_server的标志(默认是False)

xiabo's avatar
xiabo committed
85
lmdeploy serve gradio --model_path_or_server ./workspace_baichuan7b --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
xiabo's avatar
xiabo committed
86
87
88
89

在网页上输入{ip}:{pord}即可进行对话
```

xiabo's avatar
xiabo committed
90
### 运行 baichuan2-7b
xiabo's avatar
xiabo committed
91
92
```
# 模型转换
xiabo's avatar
xiabo committed
93
lmdeploy convert --model_name baichuan2-7b --model_path /path/to/model --model_format hf --tokenizer_path None --dst_path ./workspace_baichuan2-7b --tp 1
xiabo's avatar
xiabo committed
94
95

# bash界面运行
xiabo's avatar
xiabo committed
96
lmdeploy chat turbomind --model_path ./workspace_baichuan2-7b --tp 1
xiabo's avatar
xiabo committed
97
98
99
100

# 服务器网页端运行

在bash端运行:
xiabo's avatar
xiabo committed
101
lmdeploy serve gradio --model_path_or_server ./workspace_baichuan2-7b --server_name {ip} --server_port {pord} --batch_size 32 --tp 1 --restful_api False 
xiabo's avatar
xiabo committed
102
103
104
105
106

在网页上输入{ip}:{pord}即可进行对话
```

## result
xiabo's avatar
xiabo committed
107
![baichuan](docs/baichuan.gif)
xiabo's avatar
xiabo committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121

### 精度



## 应用场景

### 算法类别

`对话问答`


### 热点应用行业

xiabo's avatar
xiabo committed
122
`医疗,教育,科研,金融`
xiabo's avatar
xiabo committed
123
124
125


## 源码仓库及问题反馈
xiabo's avatar
xiabo committed
126
https://developer.hpccube.com/codes/modelzoo/baichuan_lmdeploy
xiabo's avatar
xiabo committed
127
128
129

## 参考资料
https://github.com/InternLM/LMDeploy