README.md 3.67 KB
Newer Older
huangwb's avatar
huangwb committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
 <div align="center"><strong>Text Generation Inference </strong></div>

## 简介
Text Generation Inference(TGI)是一个用 Rust 和 Python 编写的框架,用于部署和提供LLM模型的推理服务。TGI为很多大模型提供了高性能的推理服务,如LLama,Falcon,BLOOM,Baichuan,Qwen等。

## 支持模型结构列表
|     模型      | 模型并行 | FP16 |
| :----------: | :------: | :--: |
|    LLaMA          |   Yes    | Yes  |
|    LLaMA-2        |   Yes    | Yes  |
|    LLaMA-2-GPTQ        |   Yes    | Yes  |
|    LLaMA-3        |   Yes    | Yes  |
|    Codellama      |   Yes    | Yes  |
|    QWen2          |   Yes    | Yes  |
|    QWen2-GPTQ        |   Yes    | Yes  |
|    Baichuan-7B    |   Yes    | Yes  |
|    Baichuan2-7B   |   Yes    | Yes  |
|    Baichuan2-13B  |   Yes    | Yes  |


## python支持
+ Python 3.9.
+ Python 3.10.

### 使用源码编译方式安装

#### 编译环境准备

有两种方式安装准备环境
##### 方式一(建议方式):
基于光源pytorch2.1.0基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.1.0、python、dtk及系统下载对应的镜像版本。pytorch2.1.0镜像里已经安装了trition,flash-attn

##### 方式二:
基于现有python环境自己安装pytorch,triton,flash-att包:
**安装pytorch**
安装pytorch2.1.0,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.1.0的whl包。安装命令如下:
xuxzh1's avatar
last  
xuxzh1 committed
37
```bash
huangwb's avatar
huangwb committed
38
39
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
OlivierDehaene's avatar
OlivierDehaene committed
40
```
huangwb's avatar
huangwb committed
41
42
**安装triton**
triton whl包下载:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton),需要根据python、dtk版本,下载对应triton 2.1的whl包
xuxzh1's avatar
last  
xuxzh1 committed
43
```bash
huangwb's avatar
huangwb committed
44
pip install triton* (下载的triton的whl包)
OlivierDehaene's avatar
OlivierDehaene committed
45
```
46

huangwb's avatar
huangwb committed
47
48
**安装flash-attn**
flash_attn包下载:[https://cancon.hpccube.com:65024/4/main/flash_attn](https://cancon.hpccube.com:65024/4/main/flash_attn),需要根据python、dtk版本,下载对应flash_attn 2.0.4的whl包
xuxzh1's avatar
last  
xuxzh1 committed
49
```bash
huangwb's avatar
huangwb committed
50
pip install flash_attn* (下载的triton的whl包)
51
52
```

huangwb's avatar
huangwb committed
53
#### 源码编译安装流程
54

huangwb's avatar
huangwb committed
55
1. 安装Rust
56
57
58
59
```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

huangwb's avatar
huangwb committed
60
2. 安装Protoc
61
62
63
64
65
66
67
```shell
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```
huangwb's avatar
huangwb committed
68
3. 安装TGI Service
xuxzh1's avatar
last  
xuxzh1 committed
69
```bash
huangwb's avatar
huangwb committed
70
71
git clone http://developer.hpccube.com/codes/OpenDAS/text-generation-inference.git # 根据需要的分支进行切换
cd text-generation-inference
xuxzh1's avatar
last  
xuxzh1 committed
72
#安装exllama
huangwb's avatar
huangwb committed
73
74
75
76
cd server
make install-exllama #安装exllama kernels
make install-exllamav2 #安装exllmav2 kernels
cd .. #回到项目根目录
xuxzh1's avatar
last  
xuxzh1 committed
77
source $HOME/.cargo/env
huangwb's avatar
huangwb committed
78
BUILD_EXTENSIONS=True make install #安装text-generation服务
Olivier Dehaene's avatar
Init  
Olivier Dehaene committed
79
```
huangwb's avatar
huangwb committed
80
4. 安装benchmark
xuxzh1's avatar
last  
xuxzh1 committed
81
```bash
huangwb's avatar
huangwb committed
82
83
cd text-generation-inference
make install-benchmark
84
```
huangwb's avatar
huangwb committed
85
注意:若安装过程过慢,可以通过如下命令修改默认源提速。
xuxzh1's avatar
last  
xuxzh1 committed
86
```bash
huangwb's avatar
huangwb committed
87
88
89
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```
另外,`cargo install` 太慢也可以通过在`~/.cargo/config`中添加源来提速。
90

huangwb's avatar
huangwb committed
91
## 查看安装的版本号
xuxzh1's avatar
last  
xuxzh1 committed
92
```bash
huangwb's avatar
huangwb committed
93
text-generation-launcher -V  #版本号与官方版本同步
Olivier Dehaene's avatar
Init  
Olivier Dehaene committed
94
95
```

xuxzh1's avatar
last  
xuxzh1 committed
96
97
98
99
100
## 使用前

```bash
export PYTORCH_TUNABLEOP_ENABLED=0
```
Nicolas Patry's avatar
Nicolas Patry committed
101

huangwb's avatar
huangwb committed
102
## Known Issue
xuxzh1's avatar
last  
xuxzh1 committed
103

huangwb's avatar
huangwb committed
104
105
106
107
108
-

## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)