README.md 6.81 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
# <div align="center"><strong>vLLM</strong></div>
## 简介
zhuwenwen's avatar
zhuwenwen committed
3
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。
Woosuk Kwon's avatar
Woosuk Kwon committed
4

zhuwenwen's avatar
zhuwenwen committed
5
## 暂不支持的官方功能
zhuwenwen's avatar
zhuwenwen committed
6
- **量化推理**:目前不支持marlin的权重量化、kv-cache fp8推理方案
zhuwenwen's avatar
zhuwenwen committed
7
- **模块支持**:目前不支持Sliding window attention
zhuwenwen's avatar
zhuwenwen committed
8
9
10


## 支持模型结构列表
zhuwenwen's avatar
zhuwenwen committed
11
12
13
14

| 结构 | 模型 | FP16/BF16 | AWQ | GPTQ | 支持版本 | 是否优化 |
| :------: | :------: | :------: | :------: |:------: | :------: |:------: |
| LlamaForCausalLM               | Llama 3.2, Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama,DeepSeek-R1-Distill-Llama     | Yes | Yes | Yes | v0.5.0,Llama 3.2>=v0.6.2 | Yes |  
15
16
17
| Llama4ForConditionalGeneration | Llama 4                                                                               | No/Yes | -  | - | v0.8.5.post1  | No |
| QWenLMHeadModel                | QWen,Qwen-VL                                                                          | Yes | Yes | Yes | v0.5.0,Qwen-VL>=v0.6.2 | Yes |
| Qwen2ForCausalLM               | QWen2,QWen1.5,CodeQwen1.5,DeepSeek-R1-Distill-Qwen,gte_Qwen2-1.5B-instruct            | Yes | Yes | Yes | v0.5.0,gte>=v0.7.2   | Yes |
18
| Qwen3ForCausalLM               | QWen3,Qwen3-Embedding,Qwen3-Reranker                                                  | Yes | - | - | v0.8.4   | Yes |
zhuwenwen's avatar
zhuwenwen committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
| Qwen3MoeForCausalLM            | QWen3MoE                                                    | Yes | - | - | v0.8.4   | Yes |
| ChatGLMModel                   | glm-4v-9b,chatglm3,chatglm2                                 | Yes | No  | Yes | v0.5.0   | Yes |
| Glm4ForCausalLM                | GLM-4-0414                                                  | No/Yes | -  | - | v0.8.5.post1   | Yes |
| DeepseekForCausalLM            | Deepseek                                                    | Yes | No  | -   | v0.5.0  | Yes |
| DeepseekV2ForCausalLM          | DeepSeek-V2                                                 | Yes | No  | -   | v0.6.2  | Yes |
| DeepseekVLV2ForCausalLM        | DeepSeek-VL2                                                | Yes | No  | -   | v0.7.2  | Yes |
| DeepseekV3ForCausalLM          | DeepSeek-V3                                                 | Yes | Yes | -   | v0.7.2  | Yes |
| BaiChuanForCausalLM            | Baichuan2,Baichuan                                          | Yes | Yes | -   | v0.5.0  | Yes |
| BloomForCausalLM               | BLOOM                                                       | Yes | No  | Yes | v0.5.0  | Yes |
| InternLMForCausalLM            | InternLM                                                    | Yes | No  | -   | v0.5.0  | Yes |
| InternLM2ForCausalLM           | InternLM2                                                   | Yes | No  | -   | v0.5.0  | Yes |
| FalconForCausalLM              | falcon                                                      | Yes | No  | Yes | v0.5.0  | Yes |
| TeleChat2ForCausalLM           | TeleChat2                                                   | Yes | No  | -   | v0.7.2  | Yes |
| MiniCPMForCausalLM             | MiniCPM                                                     | Yes | No  | -   | v0.5.0  | Yes |
| MiniCPM3ForCausalLM            | MiniCPM3                                                    | Yes | No  | -   | v0.6.2  | Yes |
| MixtralForCausalLM             | Mixtral-8x7B,Mixtral-8x7B-Instruct                          | Yes | No  | -   | v0.5.0  | Yes |
| Qwen2MoeForCausalLM                 | Qwen2-57B-A14B,Qwen2-57B-A14B-Instruct        | Yes | No  | -   | v0.5.0   | No |
| LlavaForConditionalGeneration       | LLaMA,LLaMA-2,LLaMA-3                         | Yes | No  | -   | v0.6.2   | No |
| Qwen2VLForConditionalGeneration     | Qwen2-VL                                      | Yes | No  | Yes | v0.6.2   | No |
| Qwen2_5_VLForConditionalGeneration  | Qwen.5-VL                                     | Yes | No  | Yes | v0.7.2   | No |
39
| Mistral3ForConditionalGeneration    | Mistral3                                      | Yes | No  | -   | v0.8.5.post1   | No |
zhuwenwen's avatar
zhuwenwen committed
40
41
42
43
44
45
| Gemma3ForConditionalGeneration      | Gemma 3                                       | Yes | -   | -   | v0.8.5.post1   | No |
| MiniCPMV                            | MiniCPM-V                                     | Yes | No  | -   | v0.6.2  | No |
| Phi3VForCausalLM                    | Phi-3.5-vision                                | Yes | No  | -   | v0.6.2  | No |
| BertModel                           | bge-large-zh-v1.5                             | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaModel                     | bge-m3                                        | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaForSequenceClassification | bge-reranker-v2-m3                            | Yes | No  | -   | v0.7.2  | No |
zhuwenwen's avatar
zhuwenwen committed
46
47


zhuwenwen's avatar
zhuwenwen committed
48
49
50
51
52
## 安装
vLLM支持
+ Python 3.9.
+ Python 3.10.
+ Python 3.11.
zhuwenwen's avatar
zhuwenwen committed
53
+ Python 3.12.
Woosuk Kwon's avatar
Woosuk Kwon committed
54

zhuwenwen's avatar
zhuwenwen committed
55
### 使用源码编译方式安装
Woosuk Kwon's avatar
Woosuk Kwon committed
56

zhuwenwen's avatar
zhuwenwen committed
57
#### 编译环境准备
zhuwenwen's avatar
zhuwenwen committed
58
提供2种环境准备方式:
Woosuk Kwon's avatar
Woosuk Kwon committed
59

zhuwenwen's avatar
zhuwenwen committed
60
1. 基于光源pytorch2.4.1基础镜像环境:根据pytorch2.4.1、python、dtk及系统下载对应的镜像版本。
61

zhuwenwen's avatar
zhuwenwen committed
62
2. 基于现有python环境:安装pytorch2.4.1,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.4.1的whl包。安装命令如下:
zhuwenwen's avatar
zhuwenwen committed
63
64
65
66
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
```
Zhuohan Li's avatar
Zhuohan Li committed
67

zhuwenwen's avatar
zhuwenwen committed
68
69
#### 源码编译安装
```shell
zhuwenwen's avatar
zhuwenwen committed
70
git clone http://developer.hpccube.com/codes/OpenDAS/vllm.git # 根据需要的分支进行切换
Zhuohan Li's avatar
Zhuohan Li committed
71
```
zhuwenwen's avatar
zhuwenwen committed
72
安装依赖:
zhuwenwen's avatar
zhuwenwen committed
73
```shell
zhuwenwen's avatar
zhuwenwen committed
74
pip install -r requirements/rocm.txt
zhuwenwen's avatar
zhuwenwen committed
75
```
zhuwenwen's avatar
zhuwenwen committed
76
- 提供2种源码编译方式(进入vllm目录):
zhuwenwen's avatar
zhuwenwen committed
77
78
```
1. 编译whl包并安装
zhuwenwen's avatar
zhuwenwen committed
79
python setup.py bdist_wheel 
zhuwenwen's avatar
zhuwenwen committed
80
81
cd dist
pip install vllm*
Zhuohan Li's avatar
Zhuohan Li committed
82

zhuwenwen's avatar
zhuwenwen committed
83
2. 源码编译安装
zhuwenwen's avatar
zhuwenwen committed
84
python3 setup.py install (若调试,可使用python3 setup.py develop)
zhuwenwen's avatar
zhuwenwen committed
85
```
zhuwenwen's avatar
zhuwenwen committed
86
若需要添加git号,设置环境变量: export ADD_GIT_VERSION=1
Zhuohan Li's avatar
Zhuohan Li committed
87

zhuwenwen's avatar
zhuwenwen committed
88
#### 运行基础环境准备
zhuwenwen's avatar
zhuwenwen committed
89
1、使用上面基于光源pytorch2.4.1基础镜像环境
zhuwenwen's avatar
zhuwenwen committed
90

zhuwenwen's avatar
zhuwenwen committed
91
2、根据pytorch2.4.1、python、dtk及系统下载对应的依赖包:
zhuwenwen's avatar
zhuwenwen committed
92
93
- triton:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton/)
- flash_attn: [https://cancon.hpccube.com:65024/4/main/flash_attn](https://cancon.hpccube.com:65024/4/main/flash_attn)
gaoqiong's avatar
gaoqiong committed
94
- lmslim: [https://cancon.hpccube.com:65024/4/main/lmslim](https://cancon.hpccube.com:65024/4/main/lmslim)
zhuwenwen's avatar
zhuwenwen committed
95

zhuwenwen's avatar
zhuwenwen committed
96
#### 注意事项
zhuwenwen's avatar
zhuwenwen committed
97
+ 若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
98

zhuwenwen's avatar
zhuwenwen committed
99
## 验证
zhuwenwen's avatar
zhuwenwen committed
100
- python -c "import vllm; print(vllm.\_\_version__)",版本号与官方版本同步,查询该软件的版本号,例如0.8.5.post1;
Woosuk Kwon's avatar
Woosuk Kwon committed
101

zhuwenwen's avatar
zhuwenwen committed
102
103
## Known Issue
-
Woosuk Kwon's avatar
Woosuk Kwon committed
104

zhuwenwen's avatar
zhuwenwen committed
105
106
107
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)