README.md 2.08 KB
Newer Older
zhuwenwen's avatar
zhuwenwen committed
1
2
3
# <div align="center"><strong>vLLM</strong></div>
## 简介
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。
Zhuohan Li's avatar
Zhuohan Li committed
4

zhuwenwen's avatar
zhuwenwen committed
5
6
## 暂不支持的官方功能
- **量化推理**:除dense模型的GPTQ量化外,其它均不支持
7

Woosuk Kwon's avatar
Woosuk Kwon committed
8

zhuwenwen's avatar
zhuwenwen committed
9
10
11
12
13
14
## 安装
vLLM支持
+ Python 3.9.
+ Python 3.10.
+ Python 3.11.
+ Python 3.12.
15

zhuwenwen's avatar
zhuwenwen committed
16
### 使用源码编译方式安装
17

zhuwenwen's avatar
zhuwenwen committed
18
19
#### 编译环境准备
提供2种环境准备方式:
20

zhuwenwen's avatar
zhuwenwen committed
21
1. 基于光源vllm基础镜像环境:[https://www.sourcefind.cn/#/image/dcu/vllm?activeName=overview](推荐)。
22

zhuwenwen's avatar
zhuwenwen committed
23
24
25
26
27
2. 基于现有python环境:安装pytorch2.5.1,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.5.1的whl包。安装命令如下:
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
```
28

zhuwenwen's avatar
zhuwenwen committed
29
30
31
32
33
34
35
36
37
38
39
#### 源码编译安装
```shell
git clone http://developer.sourcefind.cn/codes/OpenDAS/vllm_dcu.git # 根据需要的分支进行切换
```
安装依赖:
```shell
pip install -r requirements/rocm.txt
```
- 提供2种源码编译方式(进入vllm目录):
```
如果使用vllm基础镜像,需要先下载vllm: pip uninstall vllm
40

zhuwenwen's avatar
zhuwenwen committed
41
42
43
44
1. 编译whl包并安装
python setup.py bdist_wheel 
cd dist
pip install vllm*
45

zhuwenwen's avatar
zhuwenwen committed
46
47
48
2. 源码编译安装
python3 setup.py install (若调试,可使用python3 setup.py develop)
```
Woosuk Kwon's avatar
Woosuk Kwon committed
49

zhuwenwen's avatar
zhuwenwen committed
50
51
#### 运行基础环境准备
1、使用上面基于光源vllm基础镜像环境(推荐)
52

zhuwenwen's avatar
zhuwenwen committed
53
54
2、根据pytorch2.5.1、python、dtk及系统下载对应的依赖包:
- triton:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton)
55

zhuwenwen's avatar
zhuwenwen committed
56
57
#### 注意事项
+ 若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
58

zhuwenwen's avatar
zhuwenwen committed
59
60
## 验证
- python -c "import vllm; print(vllm.\_\_version__)"
Simon Mo's avatar
Simon Mo committed
61

zhuwenwen's avatar
zhuwenwen committed
62
63
## Known Issue
-
Simon Mo's avatar
Simon Mo committed
64

zhuwenwen's avatar
zhuwenwen committed
65
66
67
## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)