README.md

# Baichuan-13B_CPP

## 论文

`Baichuan 2: Open Large-scale Language Models`

https://arxiv.org/abs/2309.10305

## 模型结构

Baichuan系列模型是由百川智能开发的开源大规模预训练模型，包含7B和13B等规模。其中，Baichuan-7B在大约1.2万亿tokens上训练的70亿参数模型，支持中英双语，上下文窗口长度为4096。Baichuan-13B是由百川智能继Baichuan-7B之后开发的包含130亿参数模型，它在高质量的语料上训练了1.4万亿tokens，超过LLaMA-13B 40%，是当前开源 13B 尺寸下训练数据量最多的模型。此外，百川智能还发布了对齐模型（Baichuan-13B-Chat），具有很强的对话能力。

模型具体参数：

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 总参数量 | 位置编码 | 最大长 |
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Baichuan-13B | 5,120 | 40 | 	40 | 64,000 | 13,264,901,120 | ALiBi | 4096 |

## 算法原理
Baichuan整体模型基于标准的Transformer结构，采用了和LLaMA一样的模型设计。Baichuan-13B使用了ALiBi线性偏置技术，相对于Rotary Embedding计算量更小，对推理性能有显著提升。

## 模型下载

[原版模型下载]([baichuan-inc/Baichuan-13B-Chat · Hugging Face](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat))

## 环境配置

### 环境准备
在光源可拉取推理的docker镜像，拉取方式如下：
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py38-latest
```
### 容器启动

模型推理容器启动命令参考如下，用户根据需要修改：

```
# <container_name> 自定义容器名
# <project_path> 当前工程所在路径
docker run -it --name=<container_name> -v <project_path>:/work -w /work --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=16G --group-add 39 image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.13.1-centos7.6-dtk-23.04-py38-latest /bin/bash
```

### 加载环境

进入容器后执行如下命令，加载运行环境变量

```
source /opt/dtk-23.04/cuda/env.sh
```

### 安装方法

```
#进入本工程目录
cd package
python setup install
```

### 模型转换

```
# 为了精简镜像，光源镜像中未包含模型推理时不需要的原版Baichuan-13B-chat模型运行所需要的依赖，
# 如果有现成的原版Baichuan-13B-chat的运行环境中，可以将模型转换脚本baichuan2flm.py移动到原版模型的运行环境中，
# 也可以通过执行pip install -r requirements.txt安装模型转换所需依赖；
# 如果使用已经下载完成的模型或者自己finetune的模型需要修改baichuan2flm.py文件中创建tokenizer, model时的模型存放路径
# 执行：
python3 baichuan2flm.py baichuan-13b-fp16.bin float16 # 导出fp16模型，参数为导出的模型路径

# 如果使用的dcu显存为16G，则需要用int8精度模型：
python3 baichuan2flm.py baichuan-13b-int8.bin int8 # 导出fp16模型，参数为导出的模型路径
```


### 模型推理

```
# 命令行聊天程序，使用了模型创建以及流式对话效果
python cli_demo.py -p baichuan-13b-fp16.bin

# 简易webui，需要先安装streamlit-chat
streamlit run web_demo.py baichuan-13b-fp16.bin 
```

### 推理性能测试

可以使用benchmark程序进行测速，根据./benchmark -h描述进行配置和测试，不同配置、不同输入，推理速度也会有一些差别

```
# 进入benchmark所在目录
cd benchmark

# 添加benchmark可执行权限
chmod +x benchmark

# 测试示例
./benchmark -p ../baichuan-13b-fp16.bin -f prompts/beijing.txt 
./benchmark -p ../baichuan-13b-fp16.bin -f prompts/hello.txt -b 512 -l 18
```

## 运行效果展示

![baochuan推理](baichuan-13b.gif)

## 应用场景

### 算法类别

`自然语言处理`

### 热点应用行业

`nlp,智能聊天助手,科研`

## 源码仓库及问题反馈

- https://developer.hpccube.com/codes/modelzoo/baichuan-13b_cpp

## 参考资料

- [https://github.com/baichuan-inc/Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B)