README.md 8.38 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
# <div align="center"><strong>vLLM</strong></div>
## 简介
zhuwenwen's avatar
zhuwenwen committed
3
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。
Woosuk Kwon's avatar
Woosuk Kwon committed
4

zhuwenwen's avatar
zhuwenwen committed
5
6

## 支持模型结构列表
zhuwenwen's avatar
zhuwenwen committed
7

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
| 结构                                | 模型 | FP16/BF16 | AWQ | GPTQ | 支持版本 | 是否优化 |
| :---------------------------------: | :------: | :------: | :------: |:------: | :------: |:------: |
| LlamaForCausalLM                    | Llama 3.2, Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama,DeepSeek-R1-Distill-Llama     | Yes | Yes | Yes | v0.5.0,Llama 3.2>=v0.6.2 | Yes |  
| Llama4ForConditionalGeneration      | Llama 4                                                                               | No/Yes | -  | - | v0.8.5.post1  | No |
| QWenLMHeadModel                     | QWen,Qwen-VL                                                                          | Yes | Yes | Yes | v0.5.0,Qwen-VL>=v0.6.2 | Yes |
| Qwen2ForCausalLM                    | QWen2,QWen1.5,CodeQwen1.5,DeepSeek-R1-Distill-Qwen,gte_Qwen2-1.5B-instruct            | Yes | Yes | Yes | v0.5.0,gte>=v0.7.2   | Yes |
| Qwen3ForCausalLM                    | QWen3,Qwen3-Embedding,Qwen3-Reranker                                                  | Yes | - | - | v0.8.4   | Yes |
| Qwen3MoeForCausalLM                 | QWen3MoE                                      | Yes    | -   | -   | v0.8.4       | Yes |
| Qwen3NextForCausalLM                | QWen3-Next                                    | Yes    | -   | -   | v0.11.0      | Yes |
| ChatGLMModel                        | glm-4v-9b,chatglm3,chatglm2                   | Yes    | No  | Yes | v0.5.0       | Yes |
| Glm4ForCausalLM                     | GLM-4-0414                                    | No/Yes | -   | -   | v0.8.5.post1 | Yes |
| Glm4MoeForCausalLM                  | GLM-4.5,GLM-4.6,GLM-4.7,GLM-4.5-Air           | Yes    | -   | -   | v0.9.2       | Yes |
| Glm4vMoeForConditionalGeneration    | GLM-4.5V                                      | Yes    | -   | -   | v0.11.0      | Yes |
| DeepseekForCausalLM                 | Deepseek                                      | Yes    | No  | -   | v0.5.0       | Yes |
| DeepseekV2ForCausalLM               | DeepSeek-V2                                   | Yes    | No  | -   | v0.6.2       | Yes |
| DeepseekVLV2ForCausalLM             | DeepSeek-VL2                                  | Yes    | No  | -   | v0.7.2       | Yes |
| DeepseekV3ForCausalLM               | DeepSeek-V3                                   | Yes    | Yes | -   | v0.7.2       | Yes |
| DeepseekV32ForCausalLM              | DeepSeek-V3.2                                 | Yes    | Yes | -   | v0.11.0      | No  |
| GptOssForCausalLM                   | gpt-oss                                       | Yes    | -   | -   | v0.11.0      | Yes |
| BaiChuanForCausalLM                 | Baichuan2,Baichuan                            | Yes    | No  | No  | v0.11.0      | Yes |
| BloomForCausalLM                    | BLOOM                                         | Yes    | No  | Yes | v0.5.0       | Yes |
| InternLMForCausalLM                 | InternLM                                      | Yes    | No  | -   | v0.5.0       | Yes |
| InternLM2ForCausalLM                | InternLM2                                     | Yes    | No  | -   | v0.5.0       | Yes |
| FalconForCausalLM                   | falcon                                        | Yes    | No  | Yes | v0.5.0       | Yes |
| TeleChat2ForCausalLM                | TeleChat2                                     | Yes    | No  | -   | v0.7.2       | Yes |
| MiniCPMForCausalLM                  | MiniCPM                                       | Yes    | No  | -   | v0.5.0       | Yes |
| MiniCPM3ForCausalLM                 | MiniCPM3                                      | Yes    | No  | -   | v0.6.2       | Yes |
| MixtralForCausalLM                  | Mixtral-8x7B,Mixtral-8x7B-Instruct            | Yes    | No  | -   | v0.5.0       | Yes |
| Qwen2MoeForCausalLM                 | Qwen2-57B-A14B,Qwen2-57B-A14B-Instruct        | Yes    | No  | -   | v0.5.0       | No  |
| LlavaForConditionalGeneration       | LLaMA,LLaMA-2,LLaMA-3                         | Yes    | No  | -   | v0.6.2       | No  |
| Qwen2VLForConditionalGeneration     | Qwen2-VL                                      | Yes    | No  | Yes | v0.6.2       | No  |
| Qwen2_5_VLForConditionalGeneration  | Qwen2.5-VL                                    | Yes    | No  | Yes | v0.7.2       | No  |
| Qwen3VLForConditionalGeneration     | Qwen3-VL                                      | Yes    | No  | Yes | v0.11.0      | No  |
| Mistral3ForConditionalGeneration    | Mistral3                                      | Yes    | No  | -   | v0.8.5.post1 | No  |
| Gemma3ForConditionalGeneration      | Gemma 3                                       | Yes    | -   | -   | v0.8.5.post1 | No  |
| MiniCPMV                            | MiniCPM-V                                     | Yes    | No  | -   | v0.6.2       | No  |
| Phi3VForCausalLM                    | Phi-3.5-vision                                | Yes    | No  | -   | v0.6.2       | No  |
| BertModel                           | bge-large-zh-v1.5                             | Yes    | No  | -   | v0.7.2       | No  |
| XLMRobertaModel                     | bge-m3                                        | Yes    | No  | -   | v0.7.2       | No  |
| XLMRobertaForSequenceClassification | bge-reranker-v2-m3                            | Yes    | No  | -   | v0.7.2       | No  |
zhuwenwen's avatar
zhuwenwen committed
48
49


Woosuk Kwon's avatar
Woosuk Kwon committed
50

51
## 使用源码编译方式安装
Woosuk Kwon's avatar
Woosuk Kwon committed
52

53
### 编译环境准备
zhuwenwen's avatar
zhuwenwen committed
54
提供2种环境准备方式:
Woosuk Kwon's avatar
Woosuk Kwon committed
55

56
1. 基于光源pytorch2.9.0基础镜像环境:根据pytorch2.9.0、python、dtk及系统下载对应的镜像版本。
57

58
2. 基于现有python环境:安装pytorch2.9.0,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.5.1的whl包。安装命令如下:
zhuwenwen's avatar
zhuwenwen committed
59
60
61
62
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
```
Zhuohan Li's avatar
Zhuohan Li committed
63

64
### 源码编译安装
zhuwenwen's avatar
zhuwenwen committed
65
```shell
66
git clone http://10.16.6.30/dcutoolkit/deeplearing/vllm.git # 根据需要的分支进行切换
Zhuohan Li's avatar
Zhuohan Li committed
67
```
zhuwenwen's avatar
zhuwenwen committed
68
- 提供2种源码编译方式(进入vllm目录):
zhuwenwen's avatar
zhuwenwen committed
69
```
70
71
72
73
1. 编译whl包并安装(推荐)
python tools/check_hygon_env.py
VLLM_USE_HYGON=1 python setup.py bdist_wheel

zhuwenwen's avatar
zhuwenwen committed
74
75
cd dist
pip install vllm*
76
python tools/check_hygon_env.py
Zhuohan Li's avatar
Zhuohan Li committed
77

zhuwenwen's avatar
zhuwenwen committed
78
2. 源码编译安装
79
VLLM_USE_HYGON=1 python3 setup.py install (若调试,可使用 VLLM_USE_HYGON=1 python3 setup.py develop)
zhuwenwen's avatar
zhuwenwen committed
80
```
zhuwenwen's avatar
zhuwenwen committed
81
若需要添加git号,设置环境变量: export ADD_GIT_VERSION=1
Zhuohan Li's avatar
Zhuohan Li committed
82

83
84
85
86
87
88
> 注意:
> - 海光 DCU 环境必须设置 `VLLM_USE_HYGON=1`。
> - 构建时会自动将海光定制包(如 `torch`、`triton`、`flash_attn`)改写为当前环境中的精确 `+das.*` 版本,避免 pip 在安装时替换为不兼容的 PyPI 版本。
> - `hygon.txt` 只对高风险包做最小锁定:`numpy==1.26.4` 以及海光定制包,其余 Python 依赖保持正常解析。
> - 海光路径会将 `fastapi[standard]`、`mistral_common[image]` 这类容易触发深度回溯的 extras 依赖替换为镜像中已验证的显式依赖版本。
> - 可使用 `python tools/check_hygon_env.py` 在构建前和安装后校验关键包版本
zhuwenwen's avatar
zhuwenwen committed
89

90
### 运行基础环境准备
91
1、使用上面基于光源pytorch2.9.0基础镜像环境
zhuwenwen's avatar
zhuwenwen committed
92

93
2、根据pytorch2.9.0、python、dtk及系统下载对应的依赖包:
zhuwenwen's avatar
zhuwenwen committed
94
95
- triton:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton/)
- flash_attn: [https://cancon.hpccube.com:65024/4/main/flash_attn](https://cancon.hpccube.com:65024/4/main/flash_attn)
96
97
- flash_mla: [https://cancon.hpccube.com:65024/4/main/flash_mla](https://cancon.hpccube.com:65024/4/main/flash_mla)
- lightop: [https://cancon.hpccube.com:65024/4/main/lightop](https://cancon.hpccube.com:65024/4/main/lightop)
gaoqiong's avatar
gaoqiong committed
98
- lmslim: [https://cancon.hpccube.com:65024/4/main/lmslim](https://cancon.hpccube.com:65024/4/main/lmslim)
zhuwenwen's avatar
zhuwenwen committed
99

100
### 注意事项
101
+ 若使用 pip install 下载安装过慢,可添加源: -i https://pypi.tuna.tsinghua.edu.cn/simple/
102

zhuwenwen's avatar
zhuwenwen committed
103
## 验证
104
- python -c "import vllm; print(vllm.\_\_version__)",版本号与官方版本同步,查询该软件的版本号,例如0.15.1;
Woosuk Kwon's avatar
Woosuk Kwon committed
105

zhuwenwen's avatar
zhuwenwen committed
106
107
## Known Issue
-
Woosuk Kwon's avatar
Woosuk Kwon committed
108

zhuwenwen's avatar
zhuwenwen committed
109
110
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
111
- [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)