add baichuan2 model TGI infer toturial

42a04162 · huangwb · 42a04162 · 42a04162 · 42a04162 · 42a04162
Commit 42a04162 authored Jun 12, 2024 by huangwb
5 changed files
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "text-generation-inference"]
+	path = text-generation-inference
+	url = http://developer.hpccube.com/codes/OpenDAS/text-generation-inference.git
--- a/README.md
+++ b/README.md
+
+# Baichuan2
+
+## 论文
+- https://arxiv.org/abs/2309.10305
+
+模型具体参数：
+
+| 模型名称 | 隐含层维度 | 层数 | 头数 | 词表大小 |  位置编码 | 最大长 |
+| -------- | -------- | -------- | -------- |   -------- | -------- | -------- |
+| Baichuan 2-7B | 4,096 | 32 | 32 | 125,696 |  RoPE | 4096 |
+| Baichuan 2-13B | 5,120 | 40 | 	40 | 125,696 |   ALiBi | 4096 |
+<div align="center">
+<img src="./docs/transformer.jpg" width="400" height="300">
+</div>
+
+## 算法原理
+Baichuan整体模型基于标准的Transformer结构，采用了和LLaMA一样的模型设计。其中，Baichuan-7B在结构上采用Rotary Embedding位置编码方案、SwiGLU激活函数、基于RMSNorm的Pre-Normalization。Baichuan-13B使用了ALiBi线性偏置技术，相对于Rotary Embedding计算量更小，对推理性能有显著提升。
+<div align="center">
+<img src="./docs/transformer.png" width="450" height="300">
+</div>
+
+## 环境配置
+
+### Docker（方法一）
+TODO
+
+### Dockerfile（方法二）
+
+```
+cd ./text-generation-inference
+docker build -f Dockerfile_dcu -t tgi:latest --ulimit nofile=2048:2048 .
+# <Host Path>主机端路径
+# <Container Path>容器映射路径
+docker run -it --name llama_tgi --privileged --shm-size=64G  --device=/dev/kfd --device=/dev/dri/ --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ulimit memlock=-1:-1 --ipc=host --network host --group-add video -v /opt/hyhal:/opt/hyhal:ro -v <Host Path>:<Container Path> tgi:latest /bin/bash
+```
+
+## 数据集
+无
+
+## 推理
+### 源码编译安装
+参考源码里的[README](./text-generation-inference/README.md)源码编译部分。
+本项目源码编译需要的工具包、深度学习库等均可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
+- [DTK 24.04](https://cancon.hpccube.com:65024/1/main/DTK-24.04)
+- [Pytorch 2.1.0](https://cancon.hpccube.com:65024/4/main/pytorch/DAS1.0)
+- [Flash_attn 2.0.4](https://cancon.hpccube.com:65024/4/main/flash_attn/DAS1.0)
+- [Triton 2.1.0](https://cancon.hpccube.com:65024/4/main/triton/DAS1.0)
+
+
+### 模型下载
+
+[Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)
+
+[Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
+
+[Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)
+
+[Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)
+
+
+### 部署TGI
+#### 1. 启动TGI服务
+```
+HIP_VISIBLE_DEVICES=2 text-generation-launcher --dtype=float16 --model-id /models/baichuan2/Baichuan2-7B-Chat --trust-remote-code --port 3001
+```
+更多参数可使用如下方式查看
+```
+text-generation-launcher --help
+```
+#### 2. 请求服务
+
+curl命令方式:
+```
+curl 127.0.0.1:3001/generate \
+    -X POST \
+    -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \
+    -H 'Content-Type: application/json'
+```
+通过python调用的方式：
+```
+import requests
+
+headers = {
+    "Content-Type": "application/json",
+}
+
+data = {
+    'inputs': 'What is Deep Learning?',
+    'parameters': {
+        'max_new_tokens': 20,
+    },
+}
+
+response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data)
+print(response.json())
+# {'generated_text': ' Deep Learning is a subset of machine learning where neural networks are trained deep within a hierarchy of layers instead'}
+```
+更多API查看，请参考 [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference)
+
+
+### 精度
+无
+
+## 应用场景
+
+### 算法类别
+对话问答
+
+### 热点应用行业
+金融,科研,教育
+
+## 源码仓库及问题反馈
+* [https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi](https://developer.hpccube.com/codes/modelzoo/baichuan2_tgi)
+
+## 参考资料
+* [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)
--- a/docs/transformer.jpg
+++ b/docs/transformer.jpg
--- a/docs/transformer.png
+++ b/docs/transformer.png
--- a/text-generation-inference @ 6e6d3c1a
+++ b/text-generation-inference @ 6e6d3c1a
+Subproject commit 6e6d3c1afe567bf03a33e2ee9653a40322c9f385