README.md 6.61 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
# <div align="center"><strong>vLLM</strong></div>
## 简介
zhuwenwen's avatar
zhuwenwen committed
3
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。
Woosuk Kwon's avatar
Woosuk Kwon committed
4

zhuwenwen's avatar
zhuwenwen committed
5
## 暂不支持的官方功能
zhuwenwen's avatar
zhuwenwen committed
6
- **量化推理**:目前不支持marlin的权重量化、kv-cache fp8推理方案
zhuwenwen's avatar
zhuwenwen committed
7
- **模块支持**:目前不支持Sliding window attention
zhuwenwen's avatar
zhuwenwen committed
8
9
10


## 支持模型结构列表
zhuwenwen's avatar
zhuwenwen committed
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

| 结构 | 模型 | FP16/BF16 | AWQ | GPTQ | 支持版本 | 是否优化 |
| :------: | :------: | :------: | :------: |:------: | :------: |:------: |
| LlamaForCausalLM               | Llama 3.2, Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama,DeepSeek-R1-Distill-Llama     | Yes | Yes | Yes | v0.5.0,Llama 3.2>=v0.6.2 | Yes |  
| Llama4ForConditionalGeneration | Llama 4                                                     | No/Yes | -  | - | v0.8.5.post1  | No |
| QWenLMHeadModel                | QWen,Qwen-VL                                                | Yes | Yes | Yes | v0.5.0,Qwen-VL>=v0.6.2 | Yes |
| Qwen2ForCausalLM               | QWen2,QWen1.5,CodeQwen1.5,DeepSeek-R1-Distill-Qwen,gte_Qwen2-1.5B-instruct          | Yes | Yes | Yes | v0.5.0,gte>=v0.7.2   | Yes |
| Qwen3ForCausalLM               | QWen3                                                       | Yes | - | - | v0.8.4   | Yes |
| Qwen3MoeForCausalLM            | QWen3MoE                                                    | Yes | - | - | v0.8.4   | Yes |
| ChatGLMModel                   | glm-4v-9b,chatglm3,chatglm2                                 | Yes | No  | Yes | v0.5.0   | Yes |
| Glm4ForCausalLM                | GLM-4-0414                                                  | No/Yes | -  | - | v0.8.5.post1   | Yes |
| DeepseekForCausalLM            | Deepseek                                                    | Yes | No  | -   | v0.5.0  | Yes |
| DeepseekV2ForCausalLM          | DeepSeek-V2                                                 | Yes | No  | -   | v0.6.2  | Yes |
| DeepseekVLV2ForCausalLM        | DeepSeek-VL2                                                | Yes | No  | -   | v0.7.2  | Yes |
| DeepseekV3ForCausalLM          | DeepSeek-V3                                                 | Yes | Yes | -   | v0.7.2  | Yes |
| BaiChuanForCausalLM            | Baichuan2,Baichuan                                          | Yes | Yes | -   | v0.5.0  | Yes |
| BloomForCausalLM               | BLOOM                                                       | Yes | No  | Yes | v0.5.0  | Yes |
| InternLMForCausalLM            | InternLM                                                    | Yes | No  | -   | v0.5.0  | Yes |
| InternLM2ForCausalLM           | InternLM2                                                   | Yes | No  | -   | v0.5.0  | Yes |
| FalconForCausalLM              | falcon                                                      | Yes | No  | Yes | v0.5.0  | Yes |
| TeleChat2ForCausalLM           | TeleChat2                                                   | Yes | No  | -   | v0.7.2  | Yes |
| MiniCPMForCausalLM             | MiniCPM                                                     | Yes | No  | -   | v0.5.0  | Yes |
| MiniCPM3ForCausalLM            | MiniCPM3                                                    | Yes | No  | -   | v0.6.2  | Yes |
| MixtralForCausalLM             | Mixtral-8x7B,Mixtral-8x7B-Instruct                          | Yes | No  | -   | v0.5.0  | Yes |
| Qwen2MoeForCausalLM                 | Qwen2-57B-A14B,Qwen2-57B-A14B-Instruct        | Yes | No  | -   | v0.5.0   | No |
| LlavaForConditionalGeneration       | LLaMA,LLaMA-2,LLaMA-3                         | Yes | No  | -   | v0.6.2   | No |
| Qwen2VLForConditionalGeneration     | Qwen2-VL                                      | Yes | No  | Yes | v0.6.2   | No |
| Qwen2_5_VLForConditionalGeneration  | Qwen.5-VL                                     | Yes | No  | Yes | v0.7.2   | No |
| Gemma3ForConditionalGeneration      | Gemma 3                                       | Yes | -   | -   | v0.8.5.post1   | No |
| MiniCPMV                            | MiniCPM-V                                     | Yes | No  | -   | v0.6.2  | No |
| Phi3VForCausalLM                    | Phi-3.5-vision                                | Yes | No  | -   | v0.6.2  | No |
| BertModel                           | bge-large-zh-v1.5                             | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaModel                     | bge-m3                                        | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaForSequenceClassification | bge-reranker-v2-m3                            | Yes | No  | -   | v0.7.2  | No |
zhuwenwen's avatar
zhuwenwen committed
45
46


zhuwenwen's avatar
zhuwenwen committed
47
48
49
50
51
## 安装
vLLM支持
+ Python 3.9.
+ Python 3.10.
+ Python 3.11.
zhuwenwen's avatar
zhuwenwen committed
52
+ Python 3.12.
Woosuk Kwon's avatar
Woosuk Kwon committed
53

zhuwenwen's avatar
zhuwenwen committed
54
### 使用源码编译方式安装
Woosuk Kwon's avatar
Woosuk Kwon committed
55

zhuwenwen's avatar
zhuwenwen committed
56
#### 编译环境准备
zhuwenwen's avatar
zhuwenwen committed
57
提供2种环境准备方式:
Woosuk Kwon's avatar
Woosuk Kwon committed
58

zhuwenwen's avatar
zhuwenwen committed
59
1. 基于光源pytorch2.4.1基础镜像环境:根据pytorch2.4.1、python、dtk及系统下载对应的镜像版本。
60

zhuwenwen's avatar
zhuwenwen committed
61
2. 基于现有python环境:安装pytorch2.4.1,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.4.1的whl包。安装命令如下:
zhuwenwen's avatar
zhuwenwen committed
62
63
64
65
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
```
Zhuohan Li's avatar
Zhuohan Li committed
66

zhuwenwen's avatar
zhuwenwen committed
67
68
#### 源码编译安装
```shell
zhuwenwen's avatar
zhuwenwen committed
69
git clone http://developer.hpccube.com/codes/OpenDAS/vllm.git # 根据需要的分支进行切换
Zhuohan Li's avatar
Zhuohan Li committed
70
```
zhuwenwen's avatar
zhuwenwen committed
71
安装依赖:
zhuwenwen's avatar
zhuwenwen committed
72
```shell
zhuwenwen's avatar
zhuwenwen committed
73
pip install -r requirements/rocm.txt
zhuwenwen's avatar
zhuwenwen committed
74
```
zhuwenwen's avatar
zhuwenwen committed
75
- 提供2种源码编译方式(进入vllm目录):
zhuwenwen's avatar
zhuwenwen committed
76
77
```
1. 编译whl包并安装
zhuwenwen's avatar
zhuwenwen committed
78
python setup.py bdist_wheel 
zhuwenwen's avatar
zhuwenwen committed
79
80
cd dist
pip install vllm*
Zhuohan Li's avatar
Zhuohan Li committed
81

zhuwenwen's avatar
zhuwenwen committed
82
2. 源码编译安装
zhuwenwen's avatar
zhuwenwen committed
83
python3 setup.py install (若调试,可使用python3 setup.py develop)
zhuwenwen's avatar
zhuwenwen committed
84
```
zhuwenwen's avatar
zhuwenwen committed
85
若需要添加git号,设置环境变量: export ADD_GIT_VERSION=1
Zhuohan Li's avatar
Zhuohan Li committed
86

zhuwenwen's avatar
zhuwenwen committed
87
#### 运行基础环境准备
zhuwenwen's avatar
zhuwenwen committed
88
1、使用上面基于光源pytorch2.4.1基础镜像环境
zhuwenwen's avatar
zhuwenwen committed
89

zhuwenwen's avatar
zhuwenwen committed
90
2、根据pytorch2.4.1、python、dtk及系统下载对应的依赖包:
zhuwenwen's avatar
zhuwenwen committed
91
92
- triton:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton/)
- flash_attn: [https://cancon.hpccube.com:65024/4/main/flash_attn](https://cancon.hpccube.com:65024/4/main/flash_attn)
gaoqiong's avatar
gaoqiong committed
93
- lmslim: [https://cancon.hpccube.com:65024/4/main/lmslim](https://cancon.hpccube.com:65024/4/main/lmslim)
zhuwenwen's avatar
zhuwenwen committed
94

zhuwenwen's avatar
zhuwenwen committed
95
#### 注意事项
zhuwenwen's avatar
zhuwenwen committed
96
+ 若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
97

zhuwenwen's avatar
zhuwenwen committed
98
## 验证
zhuwenwen's avatar
zhuwenwen committed
99
- python -c "import vllm; print(vllm.\_\_version__)",版本号与官方版本同步,查询该软件的版本号,例如0.8.5.post1;
Woosuk Kwon's avatar
Woosuk Kwon committed
100

zhuwenwen's avatar
zhuwenwen committed
101
102
## Known Issue
-
Woosuk Kwon's avatar
Woosuk Kwon committed
103

zhuwenwen's avatar
zhuwenwen committed
104
105
106
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)