README.md 5.25 KB
Newer Older
xuxzh1's avatar
xuxzh1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
 <div align="center"><strong>Text Generation Inference </strong></div>

## 简介
Text Generation Inference(TGI)是一个用 Rust 和 Python 编写的框架,用于部署和提供LLM模型的推理服务。TGI为很多大模型提供了高性能的推理服务,如LLama,Falcon,BLOOM,Baichuan,Qwen等。

## 支持模型列表

- [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)
- [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal)
- [Llava Next (1.6)](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf) (Multimodal)
- [Llama](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
- [Phi 3](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- [Granite](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct)
- [Gemma](https://huggingface.co/google/gemma-7b)
- [PaliGemma](https://huggingface.co/google/paligemma-3b-pt-224)
- [Gemma2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
- [Cohere](https://huggingface.co/CohereForAI/c4ai-command-r-plus)
- [Dbrx](https://huggingface.co/databricks/dbrx-instruct)
- [Mamba](https://huggingface.co/state-spaces/mamba-2.8b-slimpj)
- [Mistral](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
- [Mixtral](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)
- [Gpt Bigcode](https://huggingface.co/bigcode/gpt_bigcode-santacoder)
- [Phi](https://huggingface.co/microsoft/phi-1_5)
- [PhiMoe](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)
- [Baichuan](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
- [Falcon](https://huggingface.co/tiiuae/falcon-7b-instruct)
- [StarCoder 2](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1)
- [Qwen 2](https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f)
- [Qwen 2 VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
- [Opt](https://huggingface.co/facebook/opt-6.7b)
- [T5](https://huggingface.co/google/flan-t5-xxl)
- [Galactica](https://huggingface.co/facebook/galactica-120b)
- [SantaCoder](https://huggingface.co/bigcode/santacoder)
- [Bloom](https://huggingface.co/bigscience/bloom-560m)
- [Mpt](https://huggingface.co/mosaicml/mpt-7b-instruct)
- [Gpt2](https://huggingface.co/openai-community/gpt2)
- [Gpt Neox](https://huggingface.co/EleutherAI/gpt-neox-20b)
- [Gptj](https://huggingface.co/EleutherAI/gpt-j-6b)
- [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal)
- [Mllama](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) (Multimodal)


## 环境要求
+ Python 3.10
+ DTK 24.04.3
+ torch 2.1.0

### 使用源码编译方式安装

#### 编译环境准备

有两种方式安装准备环境
##### 方式一:

### **TODO**

##### 方式二:

基于光源pytorch2.1.0基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch2.1.0、python、dtk及系统下载对应的镜像版本。pytorch2.1.0镜像里已经安装了trition,flash-attn

1. 安装Rust
62
63
64
65
```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

xuxzh1's avatar
xuxzh1 committed
66
2. 安装Protoc
67
68
69
70
71
72
73
```shell
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
rm -f $PROTOC_ZIP
```
xuxzh1's avatar
xuxzh1 committed
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
3. 安装TGI Service
```bash
git clone http://developer.hpccube.com/codes/OpenDAS/text-generation-inference.git # 分支进行切换-b v3.0.0
cd text-generation-inference
#安装exllama
cd server/exllamav2_kernels
python setup.py install #安装exllmav2 kernels
cd ../.. #回到项目根目录
source $HOME/.cargo/env
BUILD_EXTENSIONS=True make install #安装text-generation服务
```
4. 安装benchmark
```bash
cd text-generation-inference
make install-benchmark
Olivier Dehaene's avatar
Init  
Olivier Dehaene committed
89
```
xuxzh1's avatar
xuxzh1 committed
90
91
92
注意:若安装过程过慢,可以通过如下命令修改默认源提速。
```bash
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
93
```
xuxzh1's avatar
xuxzh1 committed
94
另外,`cargo install` 太慢也可以通过在`~/.cargo/config`中添加源来提速。
95

xuxzh1's avatar
xuxzh1 committed
96
97
98
## 查看安装的版本号
```bash
text-generation-launcher -V  #版本号与官方版本同步
99
100
```

xuxzh1's avatar
xuxzh1 committed
101
## 使用前
102

xuxzh1's avatar
xuxzh1 committed
103
104
```bash
export PYTORCH_TUNABLEOP_ENABLED=0
xuxzh1's avatar
xuxzh1 committed
105
export ATTENTION=paged
106
107
```

xuxzh1's avatar
xuxzh1 committed
108
## 使用
109

xuxzh1's avatar
xuxzh1 committed
110
111
112
113
114
115
116
117
118
119
```bash
#启动tgi服务
HIP_VISIBLE_DEVICES=2 text-generation-launcher --dtype=float16 --model-id /path/to/model --trust-remote-code --port 3001
#请求服务
curl 127.0.0.1:3001/generate \
    -X POST \
    -d '{"inputs":"What is deep learning?","parameters":{"max_new_tokens":100,"temperature":0.7}}' \
    -H 'Content-Type: application/json'
#通过python调用的方式
import requests
120

xuxzh1's avatar
xuxzh1 committed
121
122
123
headers = {
    "Content-Type": "application/json",
}
124

xuxzh1's avatar
xuxzh1 committed
125
126
127
128
129
130
data = {
    'inputs': 'What is Deep Learning?',
    'parameters': {
        'max_new_tokens': 20,
    },
}
131

xuxzh1's avatar
xuxzh1 committed
132
133
134
response = requests.post('http://127.0.0.1:3001/generate', headers=headers, json=data)
print(response.json())
# {'generated_text': ' Deep Learning is a subset of machine learning where neural networks are trained deep within a hierarchy of layers instead'}
135
136
```

Nicolas Patry's avatar
Nicolas Patry committed
137

138

xuxzh1's avatar
xuxzh1 committed
139
## Known Issue
Olivier Dehaene's avatar
v0.1.0  
Olivier Dehaene committed
140

xuxzh1's avatar
xuxzh1 committed
141
-
Olivier Dehaene's avatar
Init  
Olivier Dehaene committed
142

xuxzh1's avatar
xuxzh1 committed
143
144
145
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/huggingface/text-generation-inference](https://github.com/huggingface/text-generation-inference)