Metadata-Version: 2.4
Name: vllm
Version: 0.11.0+das.opt1.rc3.dtk2604
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Author: vLLM Team
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/vllm-project/vllm
Project-URL: Documentation, https://docs.vllm.ai/en/latest/
Project-URL: Slack, https://slack.vllm.ai/
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: <3.14,>=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: regex
Requires-Dist: cachetools
Requires-Dist: psutil
Requires-Dist: sentencepiece
Requires-Dist: numpy==1.25
Requires-Dist: requests>=2.26.0
Requires-Dist: tqdm
Requires-Dist: blake3
Requires-Dist: py-cpuinfo
Requires-Dist: transformers>=4.55.2
Requires-Dist: tokenizers>=0.21.1
Requires-Dist: protobuf
Requires-Dist: fastapi[standard]>=0.115.0
Requires-Dist: aiohttp
Requires-Dist: openai>=1.99.1
Requires-Dist: pydantic>=2.11.7
Requires-Dist: prometheus_client>=0.18.0
Requires-Dist: pillow
Requires-Dist: prometheus-fastapi-instrumentator>=7.0.0
Requires-Dist: tiktoken>=0.6.0
Requires-Dist: lm-format-enforcer==0.11.3
Requires-Dist: llguidance<0.8.0,>=0.7.11; platform_machine == "x86_64" or platform_machine == "arm64" or platform_machine == "aarch64"
Requires-Dist: outlines_core==0.2.11
Requires-Dist: diskcache==5.6.3
Requires-Dist: lark==1.2.2
Requires-Dist: xgrammar==0.1.25; platform_machine == "x86_64" or platform_machine == "aarch64" or platform_machine == "arm64"
Requires-Dist: typing_extensions>=4.10
Requires-Dist: filelock>=3.16.1
Requires-Dist: partial-json-parser
Requires-Dist: pyzmq>=25.0.0
Requires-Dist: msgspec
Requires-Dist: gguf>=0.13.0
Requires-Dist: importlib_metadata; python_version < "3.10"
Requires-Dist: mistral_common[audio,image]>=1.5.4
Requires-Dist: opencv-python-headless>=4.11.0
Requires-Dist: pyyaml
Requires-Dist: six>=1.16.0; python_version > "3.11"
Requires-Dist: setuptools<80,>=77.0.3; python_version > "3.11"
Requires-Dist: einops
Requires-Dist: compressed-tensors==0.11.0
Requires-Dist: depyf==0.19.0
Requires-Dist: cloudpickle
Requires-Dist: watchfiles
Requires-Dist: python-json-logger
Requires-Dist: scipy
Requires-Dist: ninja
Requires-Dist: pybase64
Requires-Dist: cbor2
Requires-Dist: setproctitle
Requires-Dist: openai-harmony>=0.0.3
Requires-Dist: numba==0.60.0; python_version == "3.9"
Requires-Dist: numba==0.61.2; python_version > "3.9"
Requires-Dist: boto3
Requires-Dist: botocore
Requires-Dist: datasets
Requires-Dist: ray[cgraph]>=2.48.0
Requires-Dist: peft
Requires-Dist: pytest-asyncio
Requires-Dist: tensorizer==2.10.1
Requires-Dist: packaging>=24.2
Requires-Dist: setuptools<80.0.0,>=77.0.3
Requires-Dist: setuptools-scm>=8
Requires-Dist: runai-model-streamer==0.11.0
Requires-Dist: runai-model-streamer-s3==0.11.0
Requires-Dist: timm>=1.0.17
Requires-Dist: numa
Requires-Dist: pytrie
Requires-Dist: setuptools_scm>=8
Requires-Dist: cmake==3.29
Requires-Dist: quart
Requires-Dist: fastrlock==0.8.3
Requires-Dist: cupy==12.3.0
Requires-Dist: torch<=2.7.1,>=2.5.1
Requires-Dist: torchvision<=0.22.0,>=0.20.1
Requires-Dist: triton==3.1
Requires-Dist: flash_attn==2.6.1
Requires-Dist: flash_mla==1.0.0
Requires-Dist: lightop==0.6.0
Requires-Dist: lmslim==0.3.1
Provides-Extra: bench
Requires-Dist: pandas; extra == "bench"
Requires-Dist: datasets; extra == "bench"
Provides-Extra: tensorizer
Requires-Dist: tensorizer==2.10.1; extra == "tensorizer"
Provides-Extra: fastsafetensors
Requires-Dist: fastsafetensors>=0.1.10; extra == "fastsafetensors"
Provides-Extra: runai
Requires-Dist: runai-model-streamer>=0.14.0; extra == "runai"
Requires-Dist: runai-model-streamer-gcs; extra == "runai"
Requires-Dist: google-cloud-storage; extra == "runai"
Requires-Dist: runai-model-streamer-s3; extra == "runai"
Requires-Dist: boto3; extra == "runai"
Provides-Extra: audio
Requires-Dist: librosa; extra == "audio"
Requires-Dist: soundfile; extra == "audio"
Requires-Dist: mistral_common[audio]; extra == "audio"
Provides-Extra: video
Provides-Extra: flashinfer
Requires-Dist: flashinfer-python==0.3.1; extra == "flashinfer"
Provides-Extra: petit-kernel
Requires-Dist: petit-kernel; extra == "petit-kernel"
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist

# <div align="center"><strong>vLLM</strong></div>
## 简介
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。

## 暂不支持的官方功能
- **量化推理**:目前不支持marlin的权重量化、kv-cache fp8推理方案
- **模块支持**:目前不支持Sliding window attention


## 支持模型结构列表

| 结构 | 模型 | FP16/BF16 | AWQ | GPTQ | 支持版本 | 是否优化 |
| :------: | :------: | :------: | :------: |:------: | :------: |:------: |
| LlamaForCausalLM               | Llama 3.2, Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama,DeepSeek-R1-Distill-Llama     | Yes | Yes | Yes | v0.5.0，Llama 3.2>=v0.6.2 | Yes |  
| Llama4ForConditionalGeneration | Llama 4                                                                               | No/Yes | -  | - | v0.8.5.post1  | No |
| QWenLMHeadModel                | QWen,Qwen-VL                                                                          | Yes | Yes | Yes | v0.5.0，Qwen-VL>=v0.6.2 | Yes |
| Qwen2ForCausalLM               | QWen2,QWen1.5,CodeQwen1.5,DeepSeek-R1-Distill-Qwen,gte_Qwen2-1.5B-instruct            | Yes | Yes | Yes | v0.5.0，gte>=v0.7.2   | Yes |
| Qwen3ForCausalLM               | QWen3,Qwen3-Embedding,Qwen3-Reranker                                                  | Yes | - | - | v0.8.4   | Yes |
| Qwen3MoeForCausalLM            | QWen3MoE                                                    | Yes | - | - | v0.8.4   | Yes |
| ChatGLMModel                   | glm-4v-9b,chatglm3,chatglm2                                 | Yes | No  | Yes | v0.5.0   | Yes |
| Glm4ForCausalLM                | GLM-4-0414                                                  | No/Yes | -  | - | v0.8.5.post1   | Yes |
| DeepseekForCausalLM            | Deepseek                                                    | Yes | No  | -   | v0.5.0  | Yes |
| DeepseekV2ForCausalLM          | DeepSeek-V2                                                 | Yes | No  | -   | v0.6.2  | Yes |
| DeepseekVLV2ForCausalLM        | DeepSeek-VL2                                                | Yes | No  | -   | v0.7.2  | Yes |
| DeepseekV3ForCausalLM          | DeepSeek-V3                                                 | Yes | Yes | -   | v0.7.2  | Yes |
| BaiChuanForCausalLM            | Baichuan2,Baichuan                                          | Yes | Yes | -   | v0.5.0  | Yes |
| BloomForCausalLM               | BLOOM                                                       | Yes | No  | Yes | v0.5.0  | Yes |
| InternLMForCausalLM            | InternLM                                                    | Yes | No  | -   | v0.5.0  | Yes |
| InternLM2ForCausalLM           | InternLM2                                                   | Yes | No  | -   | v0.5.0  | Yes |
| FalconForCausalLM              | falcon                                                      | Yes | No  | Yes | v0.5.0  | Yes |
| TeleChat2ForCausalLM           | TeleChat2                                                   | Yes | No  | -   | v0.7.2  | Yes |
| MiniCPMForCausalLM             | MiniCPM                                                     | Yes | No  | -   | v0.5.0  | Yes |
| MiniCPM3ForCausalLM            | MiniCPM3                                                    | Yes | No  | -   | v0.6.2  | Yes |
| MixtralForCausalLM             | Mixtral-8x7B,Mixtral-8x7B-Instruct                          | Yes | No  | -   | v0.5.0  | Yes |
| Qwen2MoeForCausalLM                 | Qwen2-57B-A14B,Qwen2-57B-A14B-Instruct        | Yes | No  | -   | v0.5.0   | No |
| LlavaForConditionalGeneration       | LLaMA,LLaMA-2,LLaMA-3                         | Yes | No  | -   | v0.6.2   | No |
| Qwen2VLForConditionalGeneration     | Qwen2-VL                                      | Yes | No  | Yes | v0.6.2   | No |
| Qwen2_5_VLForConditionalGeneration  | Qwen.5-VL                                     | Yes | No  | Yes | v0.7.2   | No |
| Mistral3ForConditionalGeneration    | Mistral3                                      | Yes | No  | -   | v0.8.5.post1   | No |
| Gemma3ForConditionalGeneration      | Gemma 3                                       | Yes | -   | -   | v0.8.5.post1   | No |
| MiniCPMV                            | MiniCPM-V                                     | Yes | No  | -   | v0.6.2  | No |
| Phi3VForCausalLM                    | Phi-3.5-vision                                | Yes | No  | -   | v0.6.2  | No |
| BertModel                           | bge-large-zh-v1.5                             | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaModel                     | bge-m3                                        | Yes | No  | -   | v0.7.2  | No |
| XLMRobertaForSequenceClassification | bge-reranker-v2-m3                            | Yes | No  | -   | v0.7.2  | No |


## 安装
vLLM支持
+ Python 3.9.
+ Python 3.10.
+ Python 3.11.
+ Python 3.12.

### 使用源码编译方式安装

#### 编译环境准备
提供2种环境准备方式:

1. 基于光源pytorch2.5.1基础镜像环境:根据pytorch2.5.1、python、dtk及系统下载对应的镜像版本。

2. 基于现有python环境:安装pytorch2.5.1,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch](https://cancon.hpccube.com:65024/4/main/pytorch),根据python、dtk版本,下载对应pytorch2.5.1的whl包。安装命令如下:
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools wheel
```

#### 源码编译安装
```shell
git clone http://developer.hpccube.com/codes/OpenDAS/vllm.git # 根据需要的分支进行切换
```
安装依赖:
```shell
pip install -r requirements/rocm.txt
```
- 提供2种源码编译方式(进入vllm目录):
```
1. 编译whl包并安装
python setup.py bdist_wheel 
cd dist
pip install vllm*

2. 源码编译安装
python3 setup.py install （若调试，可使用python3 setup.py develop）
```
若需要添加git号，设置环境变量: export ADD_GIT_VERSION=1

#### 运行基础环境准备
1、使用上面基于光源pytorch2.5.1基础镜像环境

2、根据pytorch2.5.1、python、dtk及系统下载对应的依赖包:
- triton:[https://cancon.hpccube.com:65024/4/main/triton](https://cancon.hpccube.com:65024/4/main/triton/)
- flash_attn: [https://cancon.hpccube.com:65024/4/main/flash_attn](https://cancon.hpccube.com:65024/4/main/flash_attn)
- lmslim: [https://cancon.hpccube.com:65024/4/main/lmslim](https://cancon.hpccube.com:65024/4/main/lmslim)

#### 注意事项
+ 若使用 pip install 下载安装过慢,可添加源:-i https://pypi.tuna.tsinghua.edu.cn/simple/

## 验证
- python -c "import vllm; print(vllm.\_\_version__)",版本号与官方版本同步,查询该软件的版本号,例如0.11.0;

## Known Issue
- 无

## 参考资料
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/vllm-project/vllm](https://github.com/vllm-project/vllm)
