README.md


# Baichuan2

## 论文
- https://arxiv.org/abs/2309.10305

模型具体参数：

| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 |  位置编码 | 最大长 |
| -------- | -------- | -------- | -------- |   -------- | -------- | -------- |
| Baichuan 2-7B | 4,096 | 32 | 32 | 125,696 |  RoPE | 4096 |
| Baichuan 2-13B | 5,120 | 40 | 	40 | 125,696 |   ALiBi | 4096 |
<div align="center">
<img src="./docs/transformer.jpg" width="400" height="300">
</div>

## 算法原理
Baichuan整体模型基于标准的Transformer结构，采用了和LLaMA一样的模型设计。其中，Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术，相对于Rotary Embedding计算量更小，对推理性能有显著提升。
<div align="center">
<img src="./docs/transformer.png" width="450" height="300">
</div>

## 环境配置

### Docker（方法一）
TODO

### Dockerfile（方法二）

```
cd ./text-generation-inference
docker build -f Dockerfile_dcu -t tgi:latest --ulimit nofile=2048:2048 .
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run -it --name llama_tgi --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> tgi:latest /bin/bash
```

## 数据集
无

## 推理
### 源码编译安装
参考源码里的[README](./text-generation-inference/README.md)源码编译部分。
本项目源码编译需要的工具包、深度学习库等均可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
- [DTK 24.04](https://cancon.hpccube.com:65024/1/main/DTK-24.04)
- [Pytorch 2.1.0](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.0)
- [Flash_attn 2.0.4](https://cancon.hpccube.com:65024/4/main/flash_attn/DAS1.0)
- [Triton 2.1.0](https://cancon.hpccube.com:65024/4/main/triton/DAS1.0)


### 模型下载

[Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)

[Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)

[Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)

[Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)


### 部署TGI
#### 1. 启动TGI服务
```
HIP_VISIBLE_DEVICES=2 text-generation-launcher --dtype=float16 --model-id /models/baichuan2/Baichuan2-7B-Chat --trust-remote-code --port 3001
```
更多参数可使用如下方式查看
```
text-generation-launcher --help
```
#### 2. 请求服务

curl命令方式:
```
curl 127.0.0.1:3001/generate \
    -X POST \
    -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \
    -H 'Content-Type: application/json'
```
通过python调用的方式：
```
import requests

headers = {
    "Content-Type": "application/json",
}

data = {
    'inputs': 'What is Deep Learning?',
    'parameters': {
        'max_new_tokens': 20,
    },
}

response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data)
print(response.json())
# {'generated_text': ' Deep Learning is a subset of machine learning where neural networks are trained deep within a hierarchy of layers instead'}
```
更多API查看，请参考 [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference)


### 精度
无

## 应用场景

### 算法类别
对话问答

### 热点应用行业
金融,科研,教育

## 源码仓库及问题反馈
* [https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi](https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi)

## 参考资料
* [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)