Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
65d64273
Commit
65d64273
authored
Nov 22, 2024
by
zhuwenwen
Browse files
update readme
parent
308e5937
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
README.md
README.md
+4
-3
No files found.
README.md
View file @
65d64273
...
...
@@ -3,14 +3,14 @@
vLLM是一个快速且易于使用的LLM推理和服务库,使用PageAttention高效管理kv内存,Continuous batching传入请求,支持很多Hugging Face模型,如LLaMA & LLaMA-2、Qwen、Chatglm2 & Chatglm3等。
## 暂不支持的官方功能
-
**量化推理**
:目前支持fp16的推理和gptq,awq-int4推理,m
r
alin的权重量化、kv-cache fp8推理方案暂不支持
-
**量化推理**
:目前支持fp16的推理和gptq,awq-int4推理,ma
r
lin的权重量化、kv-cache fp8推理方案暂不支持
-
**模块支持**
:目前不支持Sliding window attention
## 支持模型结构列表
| 结构 | 模型 | 模型并行 | FP16 |
| :------: | :------: | :------: | :------: |
| LlamaForCausalLM | Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama
、
deepseek | Yes | Yes |
| LlamaForCausalLM | Llama 3.1,Llama 3,Llama 2,Llama,Yi,Codellama
,
deepseek
| Yes | Yes |
| QWenLMHeadModel | QWen,Qwen-VL | Yes | Yes |
| Qwen2ForCausalLM | QWen2,QWen1.5,CodeQwen1.5 | Yes | Yes |
| ChatGLMModel | glm-4v-9b,chatglm3,chatglm2 | Yes | Yes |
...
...
@@ -36,6 +36,7 @@ vLLM支持
+
Python 3.9.
+
Python 3.10.
+
Python 3.11.
+
Python 3.12.
### 使用源码编译方式安装
...
...
@@ -66,7 +67,7 @@ cd dist
pip install vllm*
2. 源码编译安装
VLLM_INSTALL_PUNICA_KERNELS=1 python3 setup.py install
VLLM_INSTALL_PUNICA_KERNELS=1 python3 setup.py install
(若调试,可使用VLLM_INSTALL_PUNICA_KERNELS=1 python3 setup.py develop)
```
若需要添加git号,设置环境变量: export ADD_GIT_VERSION=1
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment