README.md

# Baichuan-13B

## 论文

`Baichuan 2: Open Large-scale Language Models`

https://arxiv.org/abs/2309.10305

## 模型结构

Baichuan-13B是由百川智能继Baichuan-7B之后开发的包含130亿参数模型，它在高质量的语料上训练了1.4万亿tokens，超过LLaMA-13B 40%。

Baichuan 2 是百川智能推出的新一代开源大语言模型，采用 2.6 万亿Tokens 的高质量语料训练。

![baichuan](doc/transformer.jpg)

模型具体参数：

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 | 位置编码 | 最大序列长度 |
| -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Baichuan-13B | 5,120 | 40 | 	40 | 64000 | ALiBi | 4096 |
| Baichuan2-13B | 5,120 | 40 | 40 | 125696 | ALiBi | 4096 |

## 算法原理
Baichuan整体模型基于标准的Transformer结构，采用了和LLaMA一样的模型设计。其中，Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术，相对于Rotary Embedding计算量更小，对推理性能有显著提升.

![baichuan](doc/transformer.png)

## 环境配置

### 环境准备

在光源可拉取推理的docker镜像，拉取方式如下：

```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38
```

### 容器启动

模型推理容器启动命令参考如下，用户根据需要修改：

```
# <container_name> 自定义容器名
# <project_path> 当前工程所在路径
docker run -it --name=<container_name> -v <project_path>:/work -w /work --privileged -v /opt/hyhal:/opt/hyhal --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --network host --shm-size=16G --group-add video image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38 /bin/bash
```

### 加载环境

进入容器后执行如下命令，加载运行环境变量

```
source /opt/dtk/cuda/env.sh
```

### 安装方法

```
#进入本工程目录
cd package
python setup.py install
```

## 数据集

无

## 推理

### 原版模型下载

[baichuan-inc/Baichuan-13B-Chat · Hugging Face](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)

[baichuan-inc/Baichuan2-13B-Chat · Hugging Face](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)


### 模型转换

```
# 为了精简镜像，光源镜像中未包含模型推理时不需要的原版Baichuan-13B-chat模型运行所需要的依赖，
# 如果有现成的原版Baichuan-13B-chat的运行环境中，可以将模型转换脚本baichuan2flm.py移动到原版模型的运行环境中，
# 也可以通过执行pip install -r requirements.txt安装模型转换所需依赖；
# 对于已经下载完成的模型或者自己finetune的模型需要修改baichuan2flm.py文件中创建tokenizer, model时的模型存放路径
# 在本工程目录下执行：
python3 baichuan2flm.py baichuan-13b-fp16.bin float16 # 导出fp16模型，参数为导出的模型路径

# 如果使用的dcu显存为16G，则需要用int8精度模型：
python3 baichuan2flm.py baichuan-13b-int8.bin int8 # 导出fp16模型，参数为导出的模型路径
```


### 模型推理

```
# 命令行聊天程序，使用了模型创建以及流式对话效果
python cli_demo.py -p baichuan-13b-fp16.bin

# 简易webui，需要先安装streamlit-chat，并且需要在容器启动时映射streamlit的端口到外部网络
streamlit run web_demo.py baichuan-13b-fp16.bin 

# 按照openai接口实现的api_server的实例:
# 需要先进入api_server_demo，安装所需依赖：
cd api_server_demo
pip install -r requirements.txt
# 运行api_server服务，使用-p指定转换后的模型文件，客户端代码可以参考openai-client.py实现：
python fastllm-openai.py -p ../baichuan-13b-fp16.bin 
# 如果需要测试服务的并发性能，可以使用openai-client.py，修改其中的prompt和concurrencys变量值后执行：
python openai-client.py
```

### 推理性能测试

可以使用benchmark程序进行测速，根据./benchmark -h描述进行配置和测试，不同配置、不同输入，推理速度也会有一些差别

```
# 进入benchmark所在目录
cd benchmark

# 添加benchmark可执行权限
chmod +x benchmark

# 测试示例
./benchmark -p ../baichuan-13b-fp16.bin -f prompts/beijing.txt -b 1
./benchmark -p ../baichuan-13b-fp16.bin -f prompts/beijing.txt -b 16
```

## result

![baochuan推理](doc/baichuan-13b.gif)

### 精度

无

## 应用场景

### 算法类别

`对话问答`

### 热点应用行业

`医疗,科研,金融,教育`

## 预训练权重


## 源码仓库及问题反馈

- https://developer.sourcefind.cn/codes/modelzoo/baichuan-13b_fastllm

## 参考资料

- https://github.com/baichuan-inc/Baichuan-13B
- https://github.com/baichuan-inc/Baichuan2